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DNA AMPLIFICATION AND SEQUENCING USING DNA MOLECULES 
GENERATED BY RANDOM FRAGMENTATION 

[0001] This application claims priority to U.S. Provisional Patent Application 
Serial No. 60/338,224, filed November 13, 2001, which is incorporated in its entirety by 
reference herein. 

FIELD OF THE INVENTION 
[00G2] The present invention is directed to the fields of genomics, molecular 
biology, and sequencing. Specifically, the present invention regards methods of preparing 
DNA molecules, preparing DNA templates for sequencing, and sequencing from randomly 
fragmented DNA molecules. 

BACKGROUND OF THE INVENTION 
[0003] DNA sequencing is the most important analytical tool for understanding 
the genetic basis of living systems. The process involves determining the positions of each of 
the four major nucleotide bases, adenine (A), cytosine (C), guanine (G), and thymine (T) 
along the DNA molecule(s) of an organism. Short sequences of DNA are usually determined 
by creating a nested set of DNA fragments that begin at a unique site and terminate at a 
plurality of positions comprised of a specific base. The fragments terminated at each of the 
four natural nucleic acid bases (A, T, G and C) are then separated according to molecular size 
in order to determine the positions of each of the four bases relative to the unique site. The 
pattern of fragment lengths caused by strands that terminate at a specific base is called a 
"sequencing ladder." The interpretation of base positions as the result of one experiment on a 
DNA molecule is called a "read." There are different methods of creating and separating the 
tested sets of terminated DNA molecules (Adams et al y 1994; Primrose, 1998; Cantor and 
Smith, 1999). 

[0004] Because the amount of any specific DNA molecule that can be isolated 
from even a large number of cells is usually very small, the only practical methods to prepare 
enough DNA molecules for most applications, including sequencing, involve amplification of 
specific DNA molecules in vivo or in vitro. There are basically six general methods 
important for manipulating DNA for analysis: 1) in vivo cloning of unique fragments of 
DNA; 2) in vitro amplification of unique fragments of DNA; 3) in vivo cloning of libraries 
(mixtures) of DNA fragments; 4) in vitro preparation of random libraries of DNA fragments; 
5) in vivo cloning of ordered libraries of DNA; and 6) in vitro preparation of ordered libraries 
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of DNA. The beneficial effect of amplifying mixtures of DNA is that it facilitates analysis of 
Jarge pieces of DNA (e.g., chromosomes) by creating libraries of molecules that are small 
enough to be analyzed by existing techniques. For example the largest molecule that can be 
subjected to DNA sequencing methods is less than 2000 bases long, which is many orders of 
magnitude shorter than single chromosomes of organisms. Although short molecules can be 
analyzed, considerable effort is required to assemble the information from the analysis of the 
short molecules into a description of the larger piece of DNA. 

1. In vivo cloning of unique DNA 

[0005] Unique-sequence source DNA molecules can be amplified by separating 
them from other molecules (e.g., by electrophoresis), ligating them into an autonomously 
replicating genetic element (e.g., a bacterial plasmid), transfecting a host cell with the 
recombinant genetic element, and growing a clone of a single transfected host cell to produce 
many copies of the genetic element having the insert with the same unique sequence as the 
source DNA (Sambrook, et al, 1989). 

2. In vitro amplification of unique DNA 

[0006] There are many methods designed to amplify DNA in vitro. Usually these 
methods are used to prepare unique DNA molecules from a complex mixture, e.g., genomic 
DNA or an artificial chromosome. Alternatively, a restricted set of molecules can be 
prepared as a library that represents a subset of sequences in the complex mixture. These 
amplification methods include PCR™, rolling circle amplification, and strand displacement 
(Walker, et al. 1996a; Walker, et al 1996b; U.S. Patent No. 5,648,213; U.S. Patent No. 
6,124,120). 

[0007] The polymerase chain reaction (PCR™) can be used to amplify specific 
regions of DNA between two known sequences (U.S. Patent No. 4,683,195, U.S. Patent No. 
4,683,202; Frohman et al, 1995). PCR™ involves the repetition of a cycle consisting of 
denaturation of the source (template) DNA, hybridization of two oligonucleotide primers to 
known sequences flanking the region to the amplified, primer extension using a DNA 
polymerase to synthesize strands complementary to the DNA region located between the two 
primer sites. Because the products of one cycle of amplification serve as source DNA for 
succeeding cycles, the amplification is exponential. PCR™ can synthesize large numbers of 
specific molecules quickly and inexpensively. 

[0008] The major disadvantages of the PCR™ method to amplify DNA are that 1) 
information about two flanking sequences must be known in order to specify the sequences of 
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the primers; 2) synthesis of primers is expensive; 3) the level of amplification achieved 
depends strongly on the primer sequences, source DNA sequence, and the molecular weight 
of the amplified DNA; and 4) the length of amplified DNA is usually limited to less than 5 
kb, although "long-distance" PCR™ (Cheng, 1994) allows molecules as long as 20 kb to be 
amplified. 

[0009] "One-sided PCR™" techniques are able to amplify unknown DNA 
adjacent to one known sequence. These techniques can be divided into 4 categories: a) 
ligation-mediated PCR™, facilitated by addition of a universal adaptor sequence to a 
terminus usually created by digestion with a restriction endonuclease; b) universal primer- 
mediated PCR™, facilitated by a primer extension reaction initiated at arbitrary sites c) 
terminal transferase-mediated PCR™, facilitated by addition of a homonucleotide "tail" to 
the 3* end of DNA fragments; and d) inverse PCR™, facilitated by circularization of the 
template molecules. These techniques can be used to amplify successive regions along a 
large DNA template in a process sometimes called "chromosome walking" (Hui et al. y 1998). 

[0010] Ligation-mediated PCR™ is practiced in many forms. Rosenthal et al 
(1990) outlined the basic process of amplifying an unknown region of DNA immediately 
adjacent to a known sequence located near the end of a restriction fragment. Reiley et al. 
(1990) used primers that were not exactly complementary with the adaptors in order to 
.suppress amplification of molecules that did not have a specific priming site. Jones (1993) 
and Siebert (1995; U.S. Patent 5,565,340) used long universal primers that formed intrastrand 
"panhandle" structures that suppressed PCR™ of molecules having two universal adaptors. 
Arnold (1994) used "vectorette" primers having unpaired central regions to increase the 
specificity of one-sided PCR™. Macrae and Brenner (1994) amplified short inserts from a 
Fugu genomic clone library using nested primers from a specific sequence and from vector 
sequences. Lin et al. (1995) ligated an adaptor to restriction fragment ends that had an 
overhanging 5' end and employed hot-start PCR™ with a single universal anchor primer and 
nested specific-site primers to specifically amplify human sequences. Liao et ah (1997) used 
two specific site primers and 2 universal adaptors, one of which had a blocked V end to 
reduce non-specific background, to amplify zebrafish promotors. Devon et al. (1995) used 
"splinkerette-vectorette" adaptors with special secondary structure in order to decrease non- 
specific amplification of molecules with two universal sequences during ligation-mediated 
I>CR™. Padegimas and Reichert (1998) used phosphorothioate-blocked oligonucleotides and 
exoin digestion to remove the unligated and partially ligated molecules from the reactions 
before performing PCR™, in order to increase the specificity of amplification of maize 

3 



BNSOOCID: <WO 03050242A2J_> 



WO 03/050242 



PCT7US02/37322 



sequences. Zhang and Gurr (2000) used ligation-mediated hot-start PCR™ of restriction 
fragments using nested primers in order to amplify up to 6 kb of a fungal genome. The large 
amplicons were subsequently directly sequenced using primer extension. 

[0011] To increase the specificity of ligation-mediated PCR™ products, many 
methods have been used to "index" the amplification process by selection for specific 
sequences adjacent to one or both termini (e.g., Smith, 1992; Unrau, 1994; Guilfoyle, 1997; 
U.S. Patent No. 5,508,169). 

[0012] One-sided PCR™ can also be achieved by direct amplification using a 
combination of unique and non-unique primers. Liu and Whittier (1995) developed an 
efficient PCR strategy, thermal asymmetric interlaced (TAIL)-PCR, that utilizes nested 
sequence-specific primers together with a shorter arbitrary degenerate primer so that the 
relative amplification efficiencies of specific and non-specific products can be thermally 
controlled. Harrison et al. (1997) performed one-sided PCR™ using a degenerate 
oligonucleotide primer that was complementary to an unknown sequence and three nested 
primers complementary to a known sequence in order to sequence transgenes in mouse cells. 
U.S. Patent No. 5,994,058 specifies using a unique PCR™ primer and a second, partially 
degenerate PCR™ primer to achieve one-sided PCR™. Weber et al (1998) used direct 
PCR™ of genomic DNA with nested primers from a known sequence and 1-4 primers 
complementary to frequent restriction sites. This technique does not require restriction 
digestion and ligation of adaptors to the ends of restriction fragments, 

[0013J Terminal transferase can also be used in one-sided PCR™. Cormack and 
Somssich (1997) were able to amplify the termini of genomic DNA fragments using a 
method called RAGE (rapid amplification of genome ends) by a) restricting the genome with 
one or more restriction enzymes; b) denaturing the restricted DNA; c) providing a 3* 
polythymidine tail using terminal transferase; and d) performing two rounds of PCR™ using 
nested primers complementary to a known sequence as well as the adaptor. Rudi et al 
(1999) used terminal transferase to achieve chromosome walking in bacteria using a method 
of one-sided PCR™ that is independent of restriction digestion by a) denaturation of the 
template DNA; b) linear amplification using a primer complementary to a known sequence; 
c) addition of a poly C "tail" to the 3' end of the single-stranded products of linear 
amplification using a reaction catalyzed by terminal transferase; and d) PCR™ amplification 
of the products using a second primer within the known sequence and a poly-G primer 
complementary to the poly-C tail in the unknown region. The products amplified by Rudi 
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(1999) have a very broad size distribution, probably caused by a broad distribution of lengths 
of the linearly-amplified DNA molecules. 

[0014] RNA polymerase can also be used to achieve one-sided amplification of 
DNA. U.S. Patent No. 6,027,913 shows how one-sided PCR™ can be combined with 
transcription with RNA polymerase to amplify and sequence regions of DNA with only one 
known sequence. 

[0015] Inverse PCR™ (Ochman et aL 9 1988) is another method to amplify DNA 
based on knowledge of a single DNA sequence. The template for inverse PCR™ is a circular 
molecule of DNA created by a complete restriction digestion, which contains a small region 
of known sequence as well as adjacent regions of unknown sequence. The oligonucleotide 
primers are oriented such that during PCR™ they give rise to primer extension products that 
extend way from the known sequence. This "inside-out" PCR™ results in linear DNA 
products with known sequences at the termini. 

[0016] The disadvantages of all "one-sided PCR™" methods is that a) the length 
of the products are restricted by the limitation of PCR™ (normally about 2 kb, but with 
special reagents up to 50 kb); b) whenever the products are single DNA molecules longer 
than 1 kb they are too long to directly sequence; c) in ligation-mediated PCR™ the amplicon 
lengths are very unpredictable due to random distances between the universal priming site 
and the specific priming site(s), resulting in some products that are sometimes too short to 
walk significant distance, some which are preferentially amplified due to small size, and 
some that are too long to amplify and analyze; and d) in methods that use terminal transferase 
to add a polynucleotide tail to the end of a primer extension product, there is great 
heterogeneity in the length of the amplicons due to sequence-dependent differences in the 
rate of primer extension. 

[0017] Strand displacement amplification (Walker, et al 1996a; Walker, et al. 
1996b; U.S. Patent No. 5,648,213; U.S. Patent No. 6,124,120) is a method to amplify one or 
more termini of DNA fragments using an isothermal strand displacement reaction. The 
method is initiated at a nick near the terminus of a double-stranded DNA molecule, usually 
generated by a restriction enzyme, followed by a polymerization reaction by a DNA 
polymerase that is able to displace the strand complementary to the template strand. Linear 
amplification of the complementary strand is achieved by reusing the template multiple times 
by nicking each product strand as it is synthesized. The products are strands with 5' ends at a 
unique site and 3* ends that are various distances from the 5' ends. The extent of the strand 
displacement reaction is not controlled and therefore the lengths of the product strands are not 
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uniform. The polymerase used for strand displacement amplification does not have a 5' 
exonuclease activity. 

[0018] Rolling circle amplification (U.S. Patent No. 5,648,245) is a method to 
increase the effectiveness of the strand displacement reaction by using a circular template. 
The polymerase, which does not have a 5' exonuclease activity, makes multiple copies of the 
information on the circular template as it makes multiple continuous cycles around the 
template. The length of the product is very large— typically too large to be directly 
sequenced. Additional amplification is achieved if a second strand displacement primer is 
added to the reaction to used the first strand displacement product as a template. 
3. In vivo cloning of DNA of random libraries 

[0019] Libraries are collections of small DNA molecules that represent all parts of 
a larger DNA molecule or collection of DNA molecules (Primrose, 1998; Cantor and Smith, 
1999). Libraries can be used for analytical and preparative purposes. Genomic clone 
libraries are the collection of bacterial clones containing fragments of genomic DNA. cDNA 
clone libraries are collections of clones derived from mRNA molecules. 

[0020] Cloning of non-specific DNA is commonly used to separate and amplify 
DNA for analysis. DNA from an entire genome, one chromosome, a virus, or a bacterial 
plasmid is fragmented by a suitable method (e.g., hydrodynamic shearing or digestion with 
restriction enzymes), ligated into a special region of a bacterial plasmid or other cloning 
vector, transfected into competent cells, amplified as a part of a plasmid or chromosome 
during proliferation of the cells, and harvested from the cell culture. Critical to the specificity 
of this technique is the fact that the mixture of cells carrying different DNA inserts can be 
diluted and aliquoted such that some of the aliquots, whether on a surface or in a volume of 
solution, contain a single transfected cell containing a unique fragment of DNA. 
Proliferation of this single cell (in vivo cloning) amplifies this unique fragment of DNA so 
that it can be analyzed. This "shotgun'* cloning method is used very frequently, because: 1) it 
is inexpensive; 2) it produces very pure sequences that are usually faithful copies of the 
source DNA; 3) it can be used in conjunction with clone screening techniques to create an 
unlimited amount of specific-sequence DNA; 4) it allows simultaneous amplification of 
many different sequences; 5) it can be used to amplify DNA as large as 1,000,000 bp long; 
and 6) the cloned DNA can be directly used for sequencing and other purposes. 

[0021] Cloning is inexpensive, because many pieces of DNA can be 
simultaneously transfected into host cells. The general term for this process of mixing a 
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number of different entities (e.g., electronic signals or molecules) is "multiplexing," and is a 
common strategy for increasing the number of signals or molecules that can be processed 
simultaneously and subsequently separated to recover the information about the individual 
signals or molecules. In the case of conventional cloning, the recovery process involves 
diluting the bacterial culture such that an aliquot contains a single bacterium carrying a single 
plasmid, allowing the bacterium to multiply to create many copies of the original plasmid, 
and isolating the cloned DNA for further analysis. 

[0022] The principle of multiplexing different molecules in the same transfection 
experiment is critical to the economy of the cloning method. However, after the transfection 
each clone must be grown separately and the DNA isolated separately for analysis. These 
steps, especially the DNA isolation step, are costly and time consuming. Several attempts 
have been made to multiplex steps after cloning, whereby hundreds of clones can be 
combined during the steps of DNA isolation and analysis and the characteristics of the 
individual DNA molecules recovered later. In one version of multiplex cloning the DNA 
fragments are separated into a number of pools (e.g., one hundred pools). Each pool is 
ligated into a different vector, possessing a nucleic acid tag with a unique sequence, and 
transfected into the bacteria. One clone from each transfection pool is combined with one 
clone from each of the other transfection pools in order to create a mixture of bacteria having 
a mixture of inserted sequences, where each specific inserted sequence is tagged with a 
unique vector sequence, and therefore can be identified by hybridization to the nucleic acid 
tag. This mixture of cloned DNA molecules can be subsequently separated and subjected to 
any enzymatic, chemical, or physical processes for analysis such as treatment with 
polymerase or size separation by electrophoresis. The information about individual 
molecules can be recovered by detection of the nucleic acid tag sequences by hybridization, 
PCR™ amplification, or DNA sequencing. Church has shown methods and compositions to 
use multiplex cloning to sequence DNA molecules by pooling clones tagged with different 
labels during the steps of DNA isolation, sequencing reactions, and electrophoretic separation 
of denatured DNA strands (U.S. Patent Nos. 4,942,124 and 5,149,625). The tags are added to 
the DNA as parts of the vector DNA sequences. The tags used can be detected using 
oligonucleotides labeled with radioactivity, fluorescent groups, or volatile mass labels 
(Cantor and Smith, 1999; U.S. Patent Nos. 4,942,124; 5,149,625; and 5,112,736; Richterich 
and Church, (1993)). A later patent was directed to a technique whereby the tag sequences 
are ligated to the DNA fragments before cloning using a universal vector (U.S. Patent No. 
5,714,318). Another patent specifies a method whereby the tag sequences added before 
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transfection are amplified using PCR™ after electrophoretie separation of the denatured 
DNA (PCT WO 98/1 5644). 

4. In vitro preparation of DNA as random libraries 

[0023] DNA libraries can be formed in vitro and subjected to various selection 
steps to recover information about specific sequences. In vitro libraries are rarely used in 
genomics, because the methods that exist for creating such libraries do not offer advantages 
over cloned libraries. In particular, the methods used to amplify the in vitro libraries are not 
able to amplify all the DNA in an unbiased manner, because of the size and sequence 
dependence of amplification efficiency. PCT WO 00/18960 describes how different methods 
of DNA amplification can be used to create a library of DNA molecules representing a 
specific subset of the sequences within the genome for purposes of detecting genetic 
polymorphisms. "Random-prime PCR™" (U.S. Patent No. 5,043,272; U.S. Patent No. 
5,487,985) "random-prime strand displacement" (U.S. Patent No. 6,124,120) and "AFLP" 
(U.S. Patent No. 6,045,994) are three examples of methods to create libraries that represent 
subsets of complex mixtures of DNA molecules. 

[0024] Single-molecule PCR™ can be used to amplify individual randomly- 
fragmented DNA molecules (Lukyanov et al. y 1996). In one method, the source DNA is first 
fragmented into molecules usually less than 10,000 bp in size, ligated to adaptor 
oligonucleotides, and extensively diluted and aliquoted into separate fractions such that the 
fractions often contain only a single molecule. PCR™ amplification of a fraction containing 
a single molecule creates a very large number of molecules identical to one of the original 
fragments. If the molecules are randomly fragmented, the amplified fractions represent DNA 
from random positions within the source DNA. 

[0025] WO0015779A2 describes how a specific sequence can be amplified from a 
library of circular molecules with random genomic inserts using rolling circle amplification. 
5. Direct in vivo cloning of ordered libraries of DNA 

[0026] Directed cloning is a procedure to clone DNA from different parts of a 
larger piece of DNA, usually for the purpose of sequencing DNA from a different positions 
along the source DNA. Methods to clone DNA with "nested deletions" have been used to 
make "ordered libraries" of clones that have DNA starting at different regions along a long 
piece of source DNA. In one version, one end of the source DNA is digested with one or 
more exonuclease activities to delete part of the sequence (McCombie et al, 1991; U.S. 
Patent No. 4,843,003). By controlling the extent of exonuclease digestion, the average 
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amount of the deletion can be controlled. The DNA molecules are subsequently separated 
based on size and cloned. By cloning molecules with different molecular weights, many 
copies of identical DNA plasmids are produced that have inserts ending at controlled 
positions within the source DNA. Transposon insertion (Berg et al^ 1994) is also used to 
clone different regions of source DNA by facilitating priming or cleavage at random 
positions in the plasmids. The size separation and recloning steps make both of these 
methods labor intensive and slow. They are generally limited to covering regions less than 
10 kb in size and cannot be used directly on genomic DNA but rather cloned DNA 
molecules. No in vivo methods are known to directly create ordered libraries of genomic 
DNA. 

6. Direct in vitro preparation of ordered libraries of DNA 

[0027] Ordered libraries have not been frequently created in vitro. Hagiwara 
(1996) used one-sided PCR™ to create an ordered library of PCR™ products that was used 
to sequence about 14 kb of a cosmid. The cosmids were first digested with multiple 
restriction enzymes, followed by ligation of vectorette adaptors to the products, PCR™ 
amplification of the products using primers complementary to a unique sequence in the 
cosmid and to the adaptor, size separation of the amplified DNA to establish the order of the 
restriction sites, and sequencing of the ordered PCR™ products. Because the non-uniform 
spacing of the restriction sites, 2 kb of the 16 kb region were not sequenced. This method 
required substantial effort to produce and order the PCR™ products for the job of sequencing 
cloned DNA. No in vitro methods are known to directly create ordered genomic libraries of 
DNA. 

7. Preparation of DNA 

[0028] In methods known and used in the art, molecules for sequencing are 
prepared (see, for example, Sambrook et al. (1989) or Ausubel et al. (1994)). 

[0029] Furthermore, Japan Patent No. JP8173164A2 describes a method of 
preparing DNA by sorting-out PCR amplification in the absence of cloning, fragmenting a 
jouble-stranded DNA, ligating a known-sequence oligomer to the cut end, and amplifying 
the resultant DNA fragment with a primer having the sorting-out sequence complementary to 
the oligomer. The sorting-out sequences consist of a fluorescent label and one to four bases 
at 5 ' and 3 ' termini to amplify the number of copies of the DNA fragment. 

[0030] U.S. Patent No. 6,107,023 describes a method of isolating duplex DNA 
fragments which are unique to one of two fragment mixtures, i.e., fragments which are 
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present in a mixture of duplex DNA fragments derived from a positive source, but absent 
from a fragment mixture derived from a negative source. In practicing the method, double- 
strand linkers are attached to each of the fragment mixtures, and the number of fragments in 
each mixture is amplified by successively repeating the steps of (i) denaturing the fragments 
to produce single fragment strands; (ii) hybridizing the single strands with a primer whose 
sequence is complementary to the linker region at one end of each strand, to form 
strand/primer complexes; and (iii) converting the strand/primer complexes to double-strand 
fragments in the presence of polymerase and deoxynucleotides. After the desired fragment 
amplification is achieved, the two fragment mixtures are denatured, then hybridized under 
conditions in which the linker regions associated with the two mixtures do not hybridize. 
DNA species which are unique to the positive-source mixture, i.e., which are not hybridized 
with DNA fragment strands from the negative-source mixture, are then selectively isolated. 

[0031] U.S. Patent No. 6,114,149 regards a method of amplifying a mixture of 
different-sequence DNA fragments that may be formed from RNA transcription, or derived 
from genomic single- or double-stranded DNA fragments. The fragments are treated with 
terminal deoxynucleotide transferase and a selected deoxynucleotide, to form a homopolymer 
tail at the 3' end of the anti -sense strands, and the sense strands are provided with a common 
3'-end sequence. The fragments are mixed with a homopolymer primer that is homologous to 
the homopolymer tail of the anti-sense strands, and a defined-sequence primer which is 
homologous to the sense-strand common 3 '-end sequence, with repeated cycles of fragment 
denaturation, annealing, and polymerization, to amplify the fragments. In one embodiment, 
the defined-sequence and homopolymer primers are the same, i.e., only one primer is used. 
The primers may contain selected restriction-site sequences, to provide directional restriction 
sites at the ends of the amplified fragments. 

[0032] Thus, the present invention provides a new way of preparing DNA 
templates for more efficient sequencing of difficult DNA molecules, higher sequence quality, 
and longer reads. 

SUMMARY OF THE INVENTION 
[0033] The present invention is directed to preparing DNA molecules for a variety 
of purposes, including sequencing. In specific embodiments, preparation of the molecules 
comprises random fragmentation of a parent DNA molecule to produce the fragments, 
attachment of at least one primer to the fragments, and amplification of at least a portion of 
the fragments. 
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[0034] In an object of the present invention, there is a method of preparing a DNA 
molecule, comprising obtaining a DNA molecule; randomly fragmenting the DNA molecule 
to produce DNA fragments; attaching a primer having substantially known sequence to at 
least one end of a plurality of the DNA fragments to produce primer-linked fragments; and 
amplifying a plurality of the primer-linked fragments. In a specific embodiment, the method 
further comprises concomitantly sequencing the plurality of primer-linked fragments. In 
further specific embodiments, the randomly fragmenting of the DNA molecule is by 
mechanical fragmentation, such as by hydrodynamic shearing, sonication, or nebulization, or 
chemical fragmentation, such as by acid catalytic hydrolysis, alkaline catalytic hydrolysis, 
hydrolysis by metal ions, hydroxyl radicals, irradiation, or heating. In specific embodiments, 
the heating is to a temperature of between about 40°C and 120°C, between about 80°C and 
100°C, between about 90°C and 100°C, between about 92°C and 98°C, between about 93°C 
and 97°C, or between about 94°C and 96°C. In a preferred embodiment, the heating is to a 
temperature of about 95°C. 

[0035] In a specific embodiment, the heating of the DNA molecule is in a solution 
having from 0 to about 100 mM concentration of a salt, having from about 0 to about 10 mM 
concentration of salt, having from about 0.1 to about 1 mM concentration of salt, or having 
from about 0.1 to about 0.5 mM concentration of salt. In a specific embodiment, the heating 
is in a solution of 10 mM Tris, pH 8.0; 1 mM EDTA or a solution of water. 

[0036] In another embodiment, the random fragmenting of the DNA molecule is 
by enzymatic fragmentation, such as comprising digestion with DNAse I. In specific 
embodiments, the DNAse I digestion is in the presence of Mg 2+ ions, such as in a 
concentration of about ImM to about 10 mM. In another specific embodiment, the DNAse I 
digestion is in the presence ofMn 2+ ions, such as in a concentration of about ImM to about 
10 mM. 

[0037] In a specific embodiment of the present invention, the primer is attached to 
at least one 3 ' end of at least one DNA fragment. In another specific embodiment, attachment 
of a primer having substantially known sequence to at least one 3 ' end of at least one DNA 
fragment comprises generation of a homopolymer extension of said DNA fragment, such as 
is generated by terminal deoxynucleotidyltransferase. In a specific embodiment, the 
homopolymeric extension comprises a polyG tract. 

[0038] In another specific embodiment, the attachment of a substantially known 
sequence to at least one 3 ' end of at least one DNA fragment comprises ligation of an adaptor 
. molecule to at least one end of the DNA fragment. In a specific embodiment, the adaptor 
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comprises at least one blunt end. In another specific embodiment, the adaptor comprises a 
single stranded region. In a further specific embodiment, the method further comprises 
generation of at least one blunt end of said DNA fragments, such as is generated by T4 DNA 
polymerase, Klenow, or a combination thereof. 

[0039] In another object of the present invention, there is a method of preparing a 
library of DNA molecules, comprising obtaining a plurality of DNA molecules; randomly 
fragmenting at least one of the DNA molecules to produce DNA fragments; attaching a 
primer having a substantially known sequence to at least one end of a plurality of the DNA 
fragments to produce primer-linked fragments; and amplifying a plurality of the primer- 
linked fragments. In a specific embodiment, the method further comprises concomitantly 
sequencing the plurality of primer-linked fragments. 

[0040] In an additional object of the present invention, there is a library generated 
by a method described herein. 

[0041] In an additional object of the present invention, there is a method of 
generating a library of DNA templates, comprising obtaining a plurality of DNA molecules; 
randomly fragmenting the plurality of DNA molecules to produce DNA fragments; attaching 
a first primer having substantially known sequence to at least one end of a plurality of the 
DNA fragments to produce primer-linked fragments; and amplifying a plurality of the 
primer-linked fragments, wherein the amplification utilizes a second primer complementary 
to a known sequence in the DNA fragments; and a third primer complementary to the first 
primer. Li a specific embodiment, the method further comprises the step of sequencing 
concomitantly said plurality of DNA fragments using a fourth primer complementary to said 
known sequence in the DNA fragments. In a specific embodiment, the fourth primer is said 
second primer. 

[0042] In another object of the present invention, there is a method of sequencing 
a plurality of DNA fragments concomitantly, comprising obtaining a plurality of DNA 
molecules; randomly fragmenting the DNA molecules to generate a plurality of DNA 
fragments having overlapping sequences; attaching a first primer having a substantially 
known sequence to at least one end of the plurality of the DNA fragments to produce primer- 
linked fragments; and amplifying a plurality of the primer-linked fragments, wherein the 
amplification utilizes a second primer complementary to a known sequence in the DNA 
fragments; and a third primer complementary to the first primer; and sequencing said 
plurality of DNA fragments using a fourth primer complementary to said known sequence in 
the DNA fragments. In a specific embodiment, the fourth primer is the second primer. 
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10043] In another object of the present invention, there is a method of sequencing 
a consecutive overlapping series of nucleic acid sequences, comprising the steps of obtaining 
a plurality of DNA molecules having overlapping sequences; concomitantly sequencing a 
first region in said plurality of DNA molecules using a primer complementary to a known 
sequence in said plurality of DNA molecules; and concomitantly sequencing a second region 
in said plurality of DNA molecules using a primer complementary to sequence determined 
from the sequencing of the first region, wherein the next consecutive sequencing of a region 
in the overlapping series of nucleic acid sequences is produced by initiating sequencing from 
the sequence obtained in a preceding overlapping sequencing product. In a specific 
embodiment, the obtaining step is further defined as randomly fragmenting at least one parent 
DNA molecule to generate a plurality of DNA fragments having overlapping sequences; 
attaching a first primer having a substantially known sequence to at least one end of the 
plurality of the DNA fragments to produce primer-linked fragments; and amplifying a 
plurality of the primer-linked fragments, wherein the amplification utilizes a second primer 
complementary to a known sequence in the DNA fragments; and a third primer 
complementary to the first primer. 

[0044] In an additional object of the present invention, there is a method of 
sequencing a plurality of DNA molecules, comprising obtaining said plurality of DNA 
molecules by randomly fragmenting a parent DNA molecule; sequencing concomitantly said 
plurality of DNA molecules with a primer complementary to a known sequence in said 
plurality of molecules. In a specific embodiment, the method further comprises amplification 
of the plurality of DNA molecules. In an additional specific embodiment, the amplification is 
further defined as attaching a first primer having a substantially known sequence to at least 
one end of the plurality of the DNA fragments to produce primer-linked fragments; and 
amplifying a plurality of the primer-linked fragments, wherein the amplification utilizes a 
second primer complementary to a known sequence in the DNA fragments; and a third 
primer complementary to the first primer. 

[0045] In a further object of the present invention, there is a method of preparing 
a DNA molecule having sequences which generate secondary structure in said molecule, 
comprising obtaining the DNA molecule having said sequences; randomly fragmenting the 
DNA molecule to produce a plurality of DNA fragments, wherein the plurality of DNA 
fragments comprises DNA fragments having part or all of the sequences which generate the 
secondary structure; attaching a primer having substantially known sequence to at least one 
end of a plurality of the DNA fragments to produce primer-linked fragments; and amplifying 
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a plurality of the primer-linked fragments. In a specific embodiment, the method further 
comprises concomitantly sequencing the plurality of primer-linked fragments. In a specific 
embodiment, the plurality of DNA fragments further comprises DNA fragments having none 
of the sequences which generate the secondary structure. In another specific embodiment, 
the secondary structure is a hairpin, a G quartet, or a triple helix. In a further specific 
embodiment, the obtained DNA molecule comprises genomic DNA, BAC DNA, or plasmid 
DNA. 

[0046] In another object of the present invention, there is a method of 
conditioning a 3 ' end of a DNA molecule, comprising exposing said 3 ' end to terminal 
deoxynucleotidyltransferase. In a specific embodiment, the terminal 

deoxynucleotidyltransferase is further defined as comprising 3' exonuclease activity. In 
another specific embodiment, the exposing step further comprises providing a guanine 
ribonucleotide or guanine deoxyribonucleotide. 

[0047] In an additional object of the present invention, there is a method of 
providing 3 ' exonuclease activity to the end of a DNA molecule comprising the step of 
introducing terminal deoxynucleotidyltransferase to the end of said molecule. In a specific 
embodiment, the introducing step further comprises providing a guanine ribonucleotide or 
guanine deoxyribonucleotide. 

[0048] In an additional object of the present invention, there is a method of 
preparing a probe, comprising obtaining at least one DNA molecule; randomly fragmenting 
the DNA molecule to produce DNA fragments; attaching a labeled primer having 
substantially known sequence to at least one end of a plurality of the DNA fragments to 
produce labeled primer-linked fragments; and amplifying a plurality of the primer-linked 
fragments. In a specific embodiment, the attaching step of a labeled primer comprises 
generation of a homopolymer extension of said DNA fragment, wherein said extension 
comprises the label. In a specific embodiment, the homopolymeric extension is generated by 
terminal deoxynucleotidyltransferase. In a further specific embodiment, the attaching step of 
a labeled primer comprises ligation of an adaptor molecule to at least one end of the DNA 
fragment, wherein the adaptor molecule comprises the label. In another specific 
embodiment, the label is a radionuclide, an affinity tag, a hapten, an enzyme, a chromophore, 
or a fluorophore. In another embodiment, there is a labeled probe generated from the present 
method. In an additional embodiment, there is a kit comprising a probe generated from the 
present method. 
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[0049] In another object of the present invention, there is a method of repairing a 
3' end of at least one single stranded DNA molecule, comprising providing to said 3' end a 
terminal deoxynucleotidyltransferase. In a specific embodiment, the providing step further 
comprises providing a guanine ribonucleotide, guanine deoxyribonucleotide, or both. 

[0050] In an additional object of the present invention, there is a kit for repairing a 
3 ' end of at least one single stranded DNA molecule, wherein said kit comprises a terminal 
deoxynucleotidyltransferase. 

[0051] In an additional object of the present invention, there is a method of 
detecting a damaged DNA molecule, comprising the step of providing to said damaged DNA 
molecule terminal deoxynucleotidyltransferase and a labeled guanine ribonucleotide, labeled 
guanine deoxyribonucleotide, or both. In a specific embodiment, the damaged DNA 
molecule comprises a nick or a double stranded break. In another specific embodiment, the 
providing step is further defined as providing repair to said damaged DNA molecule. In an 
additional specific embodiment, the label comprises a radionuclide, an affinity tag, a hapten, 
an enzyme, a chromophore, or a fluorophore. In a further specific embodiment, the damaged 
DNA is outside a cell. In a specific embodiment, the damaged DNA is the result of radiation, 
ultraviolet light, oxygen, a radical, a metal ion, a nuclease, or mechanical force. In a specific 
embodiment, the damaged DNA is in a cell. In another specific embodiment, cell is an 
apoptotic cell. In an additional specific embodiment, the damaged DNA is the result of 
radiation, heat, ultraviolet light, oxygen, radicals, nitric oxide, catecholamine, or a nuclease. 

[0052] Other objects, features and advantages of the present invention will 
become apparent from the following detailed description. It should be understood, however, 
that the detailed description and the specific examples, while indicating preferred 
embodiments of the invention, are given by way of illustration only, since various changes 
-aid modifications within the spirit and scope of the invention will become apparent to those 
skilled in the art from this detailed description. 

BRIEF DESCRIPTION OF THE DRAWINGS 
[0053] The following drawings form part of the present specification and are 
included to further demonstrate certain aspects of the present invention. The invention may 
be better understood by reference to one or more of these drawings in combination with the 
detailed description of specific embodiments presented herein. 



15 



BNSDOCID: <WO 03050242A2J_> 



WO 03/050242 PCT/US02/37322 

[0054] FIG. 1 demonstrates preparation of a TRF library produced by random 
fragmentation and 3 ' end tailing. 

[0055J FIG. 2 illustrates methods for random DNA fragmentation. 

[0056] FIG. 3 demonstrates methods for adding a universal sequence to the 3 ' 
ends of DNA fragments. 

[0057] FIG. 4 illustrates amplification and sequencing of a DNA library produced 
by random fragmentation. 

[0058] FIG. 5 demonstrates sequencing nested DNA templates: adaptor sequence 
contribution. 

[0059] FIG. 6 shows sequencing by walking within the amplified DNA fragment 
mixtures. 

[0060] FIG. 7 shows sequencing of nested DNA fragments as a general approach 
for difficult templates. 

[0061] FIG. 8 illustrates primary amplification of three specific regions of the E. 
coli genome from a TRF library prepared by hydrodynamic shearing. 

[0062] FIG. 9 is an additional example illustrating primary amplification of three 
specific regions of the E. coli genome from a TRF library prepared by hydrodynamic 
shearing. 

[0063] FIG. 10 is a schematic presentation of the specific region of E. coli 
genome sequenced by primer walking from a TRF library. 

[0064] FIG. 11 illustrates a schematic presentation of a 10 Kb segment of the 
human tp53 gene containing regions amplified and sequenced from a TRF library. 

[0065] FIG. 12 shows primary amplification of three specific regions of the 
human tp53 gene from a TRF library prepared by hydrodynamic shearing. 

[0066] FIG. 13 demonstrates titration of the input amount of library DNA in 
primary amplification of HS4+ priming site of the human tp53 gene from a TRF library 
prepared by hydrodynamic shearing. 

[0067] FIG. 14 shows secondary (nested) amplification of three genomic regions 
of the human tp53 gene from the hydrodynamically sheared TRF library used as sequencing 
templates. 

[0068] FIG. 15 illustrates a schematic presentation of four com genomic regions 
sequenced from a TRF library. 
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[0069] FIG. 16 shows a secondary (nested) amplification of unpublished genomic 
region located upstream from the Maysine enhancer on chromosome 3 from a com genomic 
TRF library prepared by hydrodynamic shearing. 

(0070] FIG. 17 shows a secondary (nested) amplification of unpublished genomic 
region flanking the poly-ubiquitin 1 gene (Mub Gl) from a corn TRF library prepared by 
hydrodynamic shearing. 

[0071] FIG. 18 shows a comparison of the size of DNA molecules before and 
after fragmentation by the thermal treatment and the hydrodynamic shearing. 

[0072] FIG. 19 shows primary amplification of two specific regions of the E. coli 
genome from TRF libraries prepared by the thermal fragmentation and the hydrodynamic 
shearing methods. 

[0073] FIG. 20 illustrates high throughput preparation and sequence analysis of 
multiple DNA samples in the multi-well, micro-plate format. 

[0074] FIG. 21 shows kinetics of thermal fragmentation of E. coli DNA under 
different salt buffer conditions. 

[0075] FIG. 22 illustrates a depurinization mechanism of thermal fragmentation 
on a model 5' fluorescein-labeled oligonucleotide with a single purine base. 

[0076] FIG. 23 demonstrates efficiency and peculiarity of TdT-mediated tailing 
reaction when the substrate is thermally fragmented and size-fractionated human DNA. 

[0077] FIG. 24A demonstrates efficiency of TdT-mediated dGTP tailing reaction 
when the substrates are thermally fragmented and intact 5' fluorescein-labeled 
oligonucleotide with a single guanine base and blocking AmMod C7 group at the 3' end. 

[0078] FIG. 24B demonstrates efficiency of TdT-mediated dGTP tailing reaction 
when the substrates are thermally fragmented and intact 5' fluorescein-labeled 
oligonucleotide with a single adenine base and blocking AmMod C7 group at the 3* end. 

[0079] FIG. 24C demonstrates efficiency of TdT-mediated dATP tailing reaction 
when the substrates are thermally fragmented and intact 5' fluorescein-labeled 
oligonucleotide with a single guanine base and native 3' -OH group. 

[0080] FIG. 25A shows effect of the dGTP concentration on efficiency of the 
TdT-mediated repair / tailing reaction when the substrate is 5' fluorescein-labeled 
oligonucleotides with blocking AmMod C7 group at the 3' end. 

[0081J FIG. 25B shows effect of the dGTP concentration on efficiency of TdT- 
mediated tailing reaction when the substrate is 5' fluorescein-labeled oligonucleotide with 
native OH group at the 3* end. 
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[0082] FIG. 26A demonstrates a unique role of the dGTP nucleotide in the TdT- 
mediated repair / tailing reaction on the 5' fluorescein-labeled oligonucleotide substrate with 
blocking AmMod C7 group at the 3' end. 

[0083] FIG. 26B illustrates inability of the TdT enzyme to repair and elongate in 
the presence of dGTP an oligo template with dideoxy cytosine blocking group at the 3'end. 

[0084] FIG. 27 shows that TdT-mediated riboGTP tailing of the oligonucleotide 
with blocking AmMod C7 group occurs after removal of the modified base and additional 1 
or 2 bases from the 3' end of the substrate. 

[0085] FIG. 28 demonstrates a length-controlled, TdT-mediated tailing reaction of 
the 5' fluorescein-labeled oligonucleotide substrate in the presence of a mixture of ribo- and 
deoxy GTP nucleotides. 

DETAILED DESCRIPTION OF THE INVENTION 
[0086] In keeping with long-standing patent law convention, the words "a" and 
"an" when used in the present specification in concert with the word comprising, including 
the claims, denote "one or more." 

[0087] The practice of the present invention will employ, unless otherwise 
indicated, conventional techniques of molecular biology, microbiology, recombinant DNA, 
and so forth which are within the skill of the art. Such techniques are explained fully in the 
literature. See e.g., Sambrook, Fritsch, and Maniatis, MOLECULAR CLONING: A 
LABORATORY MANUAL, Second Edition (1989), OLIGONUCLEOTIDE SYNTHESIS 
(M. J. Gait Ed., 1984), ANIMAL CELL CULTURE (R. L Freshney, Ed., 1987), the series 
METHODS IN ENZYMOLOGY (Academic Press, Inc.); GENE TRANSFER VECTORS 
FOR MAMMALIAN CELLS (J. M. Miller and M. P. Calos eds. 1987), HANDBOOK OF 
EXPERIMENTAL IMMUNOLOGY, (D. M. Weir and C C. Blackwell, Eds.), CURRENT 
PROTOCOLS IN MOLECULAR BIOLOGY (F. M. Ausubel, R. Brent, R. E. Kingston, D. 
D. Moore, J. G. Siedman, J. A. Smith, and K. Struhl, eds., 1987), CURRENT PROTOCOLS 
IN IMMUNOLOGY (J. E. co/igan, A. M. Kruisbeek, D. H. Margulies, E. M. Shevach and 
W. Strober, eds., 1991); ANNUAL REVIEW OF IMMUNOLOGY; as well as monographs 
in journals such as ADVANCES IN IMMUNOLOGY. All patents, patent applications, and 
publications mentioned herein, both supra and infra, are hereby incorporated herein by 
reference. 

[0088] U.S. 6,197, 557 is incorporated by reference herein in its entirety. 
I. The Present Invention 
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[0089] The present invention is directed to methods to prepare a DNA molecule 
or a library of DNA molecules, or both. The preparation of the DNA molecule comprises 
random fragmentation of the molecule and amplification of at least one fragment of the 
molecule. Although the prepared molecule may be used for any purpose known in the art, in 
a specific embodiment it is used for sequencing of at least a portion of the molecule. The 
present invention is also directed to libraries of DNA molecules, particularly fragments of the 
molecules generated by random fragmentation of at least one parent DNA. In a specific 
embodiment, the library members are sequenced concomitantly. 

[0090] The term "random fragmentation" as used herein refers to the 
fragmentation of a DNA molecule in a non-ordered fashion, such as irrespective of the 
sequence identity or position of the nucleotide comprising and/or surrounding the break. 

[0091] In a specific embodiment, the fragments generated by random 
fragmentation are amplified prior to sequencing. A skilled artisan recognizes that the 
products of amplification of randomly generated DNA fragments, in some embodiments 
differing in length by only a nucleotide, produces a mixture of molecules of different lengths 
terminating at different positions. Such a mixture on a gel would present as a smear, 
suggesting an inability to be utilized as templates for sequencing with clarity. However, the 
present invention is directed to utilizing this mixture of fragments of different lengths that 
terminate at different positions as sequencing templates. Furthermore, in specific 
embodiments, the mixture of fragments are sequenced concomitantly. 

[0092] In another specific embodiment, a series of overlapping sequences are 
generated by random fragmentation, the fragments are sequenced concomitantly in a 
particular region, and walking then occurs along the overlapping sequences by utilizing 
sequence determined in the preceding region. 

A. Preparation of randomly fragmented DNA 

[0093] A library is prepared in at least two steps: first, random fragmentation of 
DNA into 1-5 kb pieces and, second, attachment of universal adaptor sequence to the ends of 
DNA fragments, preferably the 3' ends (FIG. 1). These libraries are referred to as Tailed, 
Randomly Fragmented (TRF) DNA libraries. 

[0094] Random fragmentation of DNA can be achieved by methods well-known 
in the art (FIG. 2). Several examples are illustrated in FIG. 2. 
1. Mechanical fragmentation 
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[0095] Mechanical fragmentation can occur by any method known in the art, 
including hydrodynamic shearing of DNA by passing it through the narrow capillary or 
orifice (Oefiier et al, 1996; Thorstenson et al, 1998), sonicating the DNA, such as by 
ultrasound (Bankier, 1993), and/or nebulizing the DNA (Bodenteich et al, 1994). 
Mechanical fragmentation usually results in double strand breaks within the DNA molecule. 
2. Chemical fragmentation, including thermal fragmentation 

[0096] Chemical fragmentation of DNA can be achieved by any method known in 
the art, including acid or alkaline catalytic hydrolysis of DNA (Richards and Boyer, 1965), 
hydrolysis by metal ions and complexes (Komiyama and Sumaoka, 1998; Franklin, 2001; 
Branum et al, 2001), hydroxyl radicals (Tullius, 1991; Price and Tullius, 1992) or radiation 
treatment of DNA (Roots et al, 1989; Hayes et al, 1990). Chemical treatment could result 
in double or single strand breaks, or both. 

I0097J In the present invention, a novel method is provided for introducing breaks 
into a DNA molecule - the thermal fragmentation of DNA. Thermal fragmentation is 
defined as generating double or single strand breaks, or both, in a DNA molecule when the 
molecule is in the presence of a temperature greater than room temperature, in some 
embodiments at least about 40°C. In alternative embodiments, the temperature is ambient 
temperature. In further specific embodiments, the temperature is between about 40°C and 
120°C, between about 80°C and 100°C, between about 90°C and 100°C, between about 92°C 
and 98°C, between about 93°C and 97°C, or between about 94°C and 96°C. In some 
embodiments, the temperature is about 95°C. In some embodiments, the temperature is 
greater than 100°C. A skilled artisan recognizes that parameters other than temperature may 
affect the breakage, such as pH and/or salt concentration. In specific embodiments, the 
conditions of thermal fragmentation comprise neutral pH (pH 6.0 - 9.0) in low salt buffer (L- 
TE buffer) at 95°C (about 80°C - 100°C temperature range). The methods of the present 
invention produce DNA molecules that can, for example, be efficiently tailed at the 3 ' ends 
with the homopolymeric G-stretches using terminal transferase. In other embodiments, 
adaptors may be li gated to the fragment ends. 

[0098] DNA can be efficiently fragmented at neutral pH by heat (Eigner et al, 
1961). Due to instability of purine-glycosyl bonds, DNA incubation at high temperature 
results in release of purines from DNA, or depurination. Depurinated DNA, in turn, becomes 
susceptible to heat-induced hydrolysis at apurinic sites. Both processes occur at a very slow 
but physiologically significant rate (Greer and Zamenhov, 1962; Lindahl and Nyberg, 1972; 

20 



03050242A2 I > 



WO 03/050242 



PCT/US02/37322 



Lindahl and Andersson, 1972). Probably because of its low rate in standard buffers, heat- 
induced DNA hydrolysis was never used in standard molecular biology procedures to 
fragment DNA. 

[00991 Thus, in the present invention, a validated and optimized method is 
provided for introducing breaks into DNA molecules - the thermal fragmentation of DNA at 
neutral pH (pH 6.0 - 9.0) in low salt buffer (L-TE buffer) at 95 °C (about 80°C - 100 °C 
temperature range). The method produces DNA molecules, such as about 50 - about 2,000 
bases long, and the fragment length can be reproducibly controlled by time of heating and salt 
or buffer concentration, or both (FIG. 21, Example 11). The cleavage occurs mostly at purine 
sites and, in some cases, at pyrimidine bases (FIG. 22, Example 12). Thermal fragmentation 
produces DNA molecules that, at least, can be efficiently tailed at the 3' end with 
homopolymeric G-stretches using terminal transferase or that can be ligated with adaptors. 

[0100] Thermally fragmented DNA can be used to prepare random DNA libraries 
or DNA probes. 

3. Enzymatic fragmentation 

[0101] Enzymatic fragmentation of DNA may be utilized by standard methods in 
the art, such as by partial restriction digestion by Cvi JI endonuclease (Gingrich et al. 9 1996), 
or by DNAse I (Anderson, 1981; Ausubel et aL 9 1987). Fragmentation by DNAse I may 
occur in the presence of Mg 2+ ions (about 1-10 mM; predominantly single strand breaks) or in 
the presence of Mn 2+ ions (about 1-10 mM; predominantly double strand breaks). 

[0102] Among these methods, the hydrodynamic shearing process produces DNA 
molecules with an appropriate and narrow size distribution (FIG. 2). For example, the 
commercially available device HydroShear (GeneMachines, Palo Alto, CA) can randomly 
fracture the DNA to within a two-fold size distribution with the average size of molecules 
ranging from 1.5 kb to 5 kb. The method does not introduce any additional modifications to 
the DNA, and the fragments can be directly used for 3 'end tailing with the enzyme terminal 
deoxynucleotidyltransferase (TdT) or for ligation with blunt-end adaptors. 

B. Sequence attachment to the ends of DNA fragments 

[0103] A primer is attached to the ends of DNA fragments, preferably the 3 ' ends, 
and this can be achieved by any means known in the art. A skilled artisan recognizes that the 
primer can be, for example, a homopolymeric tail generated by terminal 
deoxynucleotidyltransferase or ligation of an adaptor sequence (FIG. 3). 
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[0104] The primer, in a specific embodiment, comprises a substantially known 
sequence. A skilled artisan recognizes that "substantially known" refers to having sufficient 
sequence information in order to permit preparation of a DNA molecule, including its 
amplification. This will typically be about 100%, although in some embodiments some of the 
primer sequence is random. Thus, in specific embodiments, substantially known refers to 
about 50% to about 100%, about 60% to about 100%, about 70% to about 100%, about 80% 
to about 100%, about 90% to about 100%, about 95% to about 100%, about 97% to about 
100%, about 98% to about 100%, or about 99% to about 100%. 

[0105] A skilled artisan recognizes that following fragmentation of the DNA, the 
generated fragment molecules may require conditioning, herein defined as modification to the 
ends to facilitate further steps for the fragment. For example, a 3' end may require 
conditioning following fragmentation, a 5' end may require conditioning following 
fragmentation, or both. In a specific embodiment, a 3 ' end requires conditioning following 
thermal fragmentation or mechanical fragmentation. In a further specific embodiment, the 
conditioning comprises modification of a 3 ' end lacking a 3 ' OH group. In an additional 
specific embodiment said 3\ end is conditioned through exonuclease activity by an 
exonuclease, such as a 3 ' exonuclease, to enzymatically remove the distal nucleotides of the 
fragment molecule. In a preferred embodiment, terminal deoxynucleotidyltransferase is 
utilized for such an action. In an alternative embodiment, an enzyme other than terminal 
deoxynucleotidyltransferase is utilized, such as T4 DNA polymerase or DNA polymerase I, 
including Klenow. 

1. Terminal deoxynucleotidyltransferase tailing 

[0106] The most simple and fast protocol involves addition of guanine 
nucleotides by the enzyme terminal deoxynucleotidyltransferase (TdT) (FIG. 23A). In this 
case short (10 — 20 bases) poly G tails are synthesized at the 3' ends of DNA fragments. The 
fragments for TdT-mediated tailing could be double or single stranded. The poly G tails can 
also be efficiently added to the 3' DNA termini at the nicks introduced into DNA randomly, 
for example, by DNase I or another method (see, for example, U.S. Patent No. 6,197,557 Bl). 

[0100] It is a general consensus that terminal transferase requires a 3' hydroxyl 
for addition of dGTP to synthesize the poly G tail (Grosse and Rougeon, 1993). In the present 
invention, terminal transferase is successfully used to tail DNA produced by hydrodynamic 
shearing and thermal fragmentation. Chain cleavage by heat seems to take place at the 3' side 
of the apurinic sugar residue and involve the (5 elimination reaction (Brown and Todd, 1955). 
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As a result, 3* termini with a nucleotide end having a 3*-OH residue are only generated to a 
very minor extent (Kotaka and Baldwin, 1964; Lindahl and Andersson, 1972). Results 
presented on FIG. 23 and FIG. 24 demonstrate that terminal transferase can efficiently tail 3' 
DNA termini produced by thermal fragmentation, suggesting a novel 3* exonuclease activity 
for terminal deoxynucleotidyltransferase. Such "proofreading" activity is a well known 
feature of many DNA polymerases, but it was never documented before for terminal 
transferase. 

[0108] The repair activity of terminal transferase is very different from the 3*exo- 
activity of DNA polymerases: it requires a cofactor and is manifested only in the presence of 
dGTP nucleotide (FIG. 4 and FIG. 26). The absence of tailing of 3 5 blocked termini in the 
presence of dATP, dCTP and dTTP (FIG. 26) suggests a special role for deoxyguanine 
triphosphate in the repair process catalyzed by TdT. In fact, dGTP plays a dual role in the 
tailing mechanism catalyzed by terminal transferase. First, it serves as a cofactor that induces 
the end repair process and eliminates terminal residue(s), second, it serves as a substrate for 
the tailing reaction. The number of residues removed by terminal transferase 3* exonuclease 
activity constitutes about 1-3 bases (FIG. 27). The concentration of the dGTP is critical and 
should exceed about 40 jiM (FIG. 25). 

[0109] Guanine triphosphate (riboGTP) can also stimulate the repair / tailing 
process by TdT enzyme (FIG. 27). Ribo-triphosphates are good substrates for terminal 
transferase but only a few bases can be incorporated (Boule et aL 9 2001). In this invention, a 
balanced mixture of ribo- and deoxy GTP nucleotide provides a solution for the length- 
controlled, TdT-mediated G-repair/tailing reaction that allowed addition of 8-12 guanine 
bases to DNA fragments produced by hydro-shearing or thermal fragmentation (FIG. 28). 

2. Ligation of the adaptor 

[0110] There are two types of adaptors that can be li gated to the ends of randomly 
generated DNA fragments (FIG. 3B). 

[0111] The "blunt-end" adaptor can be attached to the ends of double stranded 
DNA fragments produced by any fragmentation method (usually mechanical or enzymatic) 
(FIG. 3A; left side). Some methods of fragmentation would require an additional step that 
involves a repair of the DNA ends by T4 DNA polymerase and/or Klenow fragment and the 
removal of the 3 ' or 5 ' protrusions. 

[0112] The structure of the "blunt-end" adaptor is shown on the left side of FIG. 
3B, and it is similar to an adaptor of U.S. Patent No. 6,197,557 Bl and U.S. Patent 
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Application 09/860,738, both incorporated by reference herein. The most important feature 
of this adaptor is the blocking groups at both 3 ' ends that prevent adaptors from self-ligation. 
The phosphate group is present at one end of the adaptor to direct its ligation in only one 
orientation to DNA ends. 

[0113] The "single-stranded DNA" adaptor with short 3/ overhang containing 4 - 
6 random bases (denoted "N" in FIG. 3B) and the phosphorylated recessive 5 ' end can be 
attached to the 3' ends of single stranded DNA molecules (FIG. 3 A). Some methods of 
fragmentation would require an additional step that involves a repair of the 3 ' ends of single 
stranded molecules by the T4 DNA polymerase, Klenow fragment or exonuclease I. 

[0114] The structure of the "single-stranded DNA" adaptor is shown on the right 
side of the FIG. 3B, and it is similar to the adaptor design of U.S. Patent Application 
09/860,738, incorporated by reference herein. 

[01 15] The adaptor has blocking groups at both 3 ' ends that prevent adaptors from 

self-ligation. The phosphate group is present at the recessive 5 ' end of the adaptor. The 4-6 

base 3 ' overhang of the adaptor has a random base composition. In specific embodiments, it 

facilitates the annealing and ligation of the adaptor to single stranded DNA molecules. 

C. Amplification and direct sequencing of specific DNA regions using 
randomly fragmented and tailed DNA libraries 

[0116] The TRF library prepared by random DNA fragmentation is a highly 
redundant DNA library. Amplification of many overlapping DNA molecules by standard 
PCR™ using one sequence-specific and one universal primer (denoted "U" in FIG. 4) would 
result in selection and amplification of a very large population of molecules, specifically, a 
nested set of DNA fragments of different length which share the same priming site 
complementary to the primer Pi (FIG. 4). Because the frequency of DNA breaks introduced 
by previously described techniques is high (potentially at every base position), the number of 
DNA fragments of different length amplified by PCR™ is also very large. 

[0117] It is not obvious that the amplified molecules could be directly used for 
DNA sequencing using the same primer Pi (or nested primer P 2 ) as a sequencing primer. Two 
factors could potentially affect the quality and length of the resulting sequencing ladder. 
First, the bias toward a preferential amplification of the shortest DNA fragments could reduce 
the length of DNA sequencing. Second, the overlap between the universal adaptor sequence 
(at the randomly created end) of short DNA fragments and the DNA sequence of longer 
fragments could result in ambiguities in the base identification in the region of overlap. 
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[0118] In confirmation of data presented in U.S. Patent Application Serial No. 
60/288,205, incorporated by reference herein, regarding libraries of nick translation- 
generated molecules, the inventors found that even more complex mixtures of nested 
molecules generated by PCR™ using TRF libraries (using one or more sequence-specific and 
one universal primers) can be also directly used for sequence analysis. 

[0119] The adaptor sequence, which is located at different distances for different 
fragments, does not affect at all the quality of the sequencing data (FIG. 5). Assuming that 
the average size of the TRF library is 1500 bases and the size of the universal sequence at the 
3' end (for example, G tail) is 10 bases, there are only 10 fragments that overlap at the 
randomly chosen base position within the DNA (the star on FIG. 5) with the adaptor 
sequence (a circle on FIG. 5). For example, at the base position number 501 (the distance 
from the 3' end of the sequencing primer) about 1000 molecules contribute correct DNA 
sequencing information, and only 10 templates produce a signal generated by the universal 
adaptor sequence (FIG. 5). The expected noise-to-signal ratio due to this overlap is only 
about 10/1000 = 1%. That number is much smaller than the noise-to-signal ratio estimated in 
the case of libraries produced by partial digestion with frequently cutting restriction enzymes 
(see U.S. Patent Application 60/288,205). Practically, it means that the contribution of the 
adaptor sequence to the sequencing ladder in the case of DNA generated from the TRF 
library is negligible. 

D. Sequencing by primer "walking" within the DNA amplicons generated 
from TRF libraries 

[0120] The average size of DNA fragments within the TRF library sets a limit for 
the maximal length of DNA molecules within a population of nested molecules generated by 
PCR™, FIG. 6. The first sequencing primer St (also a sequence-specific primer during 
PCR™ amplification step) would allow determination of the sequence of the region W } (600 
- 800 bases). The rest of the amplicon can be sequenced using sequencing primers S2 and S3 
by generating the sequence information for the regions W 2 and W 3 , correspondingly. 

[0121] This strategy can help to resolve problems that usually occur when 
sequencing DNA with repeats. By choosing the PCR™ primer in the unique DNA region 
(region Si on the FIG. 6) one can amplify larger pieces of DNA containing repetitive regions. 
For example, if the repetitive DNA element is within the region W 2> then the two unique 
sequences W, and W 3 would allow an unambiguous assembly of the sequencing reads W h 
W2 and W 3 into a contiguous genomic sequence. 
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E. Nested DNA fragments as a general approach to sequence difficult DNA 
templates 

[0122] There is an important reason why the use of mixtures of nested DNA 
molecules for DNA sequencing might be in general better than the use of standard DNA 
templates with a homogeneous size: plasmids, PCR™ products, etc. If one assumes that 
there are two regions A and B within the DNA fragment that can form an intra-molecular 
structure shown in FIG. 7 A. During a sequencing reaction, the indicated region could 
introduce a problem for DNA polymerase to replicate through. As a result, the fragment will 
be only sequenced up to the region L. 

[0123] In the case of a mixture of nested fragments, the DNA can be easily 
sequenced over much longer distance, FIG. 7B. In this case, a significant fraction of DNA 
molecules will not form a hairpin structure, so the polymerase can easily replicate the DNA 
and create a sequencing ladder up to the region M. A skilled artisan recognizes that there are 
multiple examples of secondary structure, including hairpins, G quartets, triple helices, and 
the like, and that the methods of the present invention are advantageous for preparing DNA 
molecules and subsequent manipulations, such as sequencing, having such structure. 

[0124] There are several ways of implementing this method for general 
sequencing applications. First, the nested molecules can be generated by the procedures that 
have been described above. For example, recombinant plasmid DNA or PCR™ products are 
randomly fragmented, G-tailed with terminal deoxynucleotidyltransferase, and re-amplified 
by PCR™ using Ml 3 primer (in the case of plasmid DNA) or one of primers used for 
generation of PCR™ product and universal polyC primer. This method potentially can handle 
very small amounts of the original (homogeneous in size) DNA template. Secondly, the 
preparation of the improved DNA templates for DNA sequencing can be limited to just 
random fragmentation of the original DNA. 

F. Applications for the Present Invention 

[0125] In specific embodiments, the methods of the present invention are utilized 
for an application, non-limiting examples of which are provided below. 

[0126] In one embodiment, there is a method of conditioning a 3 ' end of a DNA 
molecule comprising exposing the 3 ' end to terminal deoxynucleotidyltransferase, wherein 
the terminal deoxynucleotidyltransferase comprises 3 ' exonuclease activity, a novel activity 
described herein. In preferred embodiments, the exposing step further comprises providing a 
guanine ribonucleotide, guanine deoxyribonucleotide, or both. 
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[01271 In another embodiment, there is a method of providing 3' exonuclease 
activity to the end of a DNA molecule comprising the step of introducing terminal 
deoxynucleotidyltransferase to the end of the molecule. In specific embodiments, the 
introducing step further comprises providing a guanine ribonucleotide, guanine 
deoxyribonucleotide, or both. 

[0128] In an additional embodiment, there is a method of preparing a probe, 
comprising obtaining at least one DNA molecule; randomly fragmenting the DNA molecule 
to produce DNA fragments; attaching a labeled primer having substantially known sequence 
to at least one end of a plurality of the DNA fragments to produce labeled primer-linked 
fragments; and amplifying a plurality of the primer-linked fragments. In specific 
embodiments, the attaching step of a labeled primer comprises generation of a homopolymer 
extension of said DNA fragment, wherein said extension comprises the label. In a specific 
embodiment, the homopolymeric extension is generated by terminal 
deoxynucleotidyltransferase. In an alternative embodiment, the attaching step of a labeled 
primer comprises ligation of an adaptor molecule to at least one end of the DNA fragment, 
wherein the adaptor molecule comprises the label, examples of which include a radionuclide, 
an affinity tag, a hapten, an enzyme, a chromophore, or a fluorophore. The present invention 
ulso includes a labeled probe generated from this method or a kit comprising the probe. 

[0129] In an additional embodiment of the present invention, there is a method of 
repairing a 3' end of at least one single stranded DNA molecule, comprising, providing to the 
3' end a terminal deoxynucleotidyltransferase. In a specific embodiment, the providing step 
further comprises providing a guanine ribonucleotide, guanine deoxyribonucleotide, or both. 
A skilled artisan recognizes that the term "repair" as used herein is defined as excision of at 
least one nucleotide from a 3 ' end of a DNA molecule, and polymerization. In a specific 
embodiment, the polymerization step is subsequent to the excision step. In a specific 
embodiment, the distal 3 ' nucleotide is damaged, a non-limiting example of which is defined 
as lacking a 3' OH group. In another embodiment, the terminal deoxynucleotidyltransferase 
comprises either activity for the excision of at least one nucleotide or comprises the activity 
for polymerization. In a specific embodiment, another enzyme facilitates an excision or 
polymerization process, or both. In a specific embodiment, in repair by terminal 
deoxynucleotidyltransferase, about 1-3 bases is excised prior to tailing in a polymerization 
reaction. 

[0130] In another embodiment, there is a kit for repairing a 3' end of at least one 
single stranded DNA molecule, wherein said kit comprises a terminal 

27 

BNSOOCID: <WO 03050242A2_I_> 



WO 03/050242 



PCT/US02/37322 



deoxynucleotidyltransferase. In a further specific embodiment, the kit comprises a guanine 
ribonucleotide, guanine deoxyribonucleotide, or both, and in other specific embodiments the 
guanine ribonucleotide and/or guanine deoxyribonucleotide is labeled. 

[0131] In an additional object of the present invention, there is a method of 
detecting a damaged DNA molecule, comprising the step of providing to the damaged DNA 
molecule terminal deoxynucleotidyltransferase and a labeled guanine ribonucleotide, labeled 
guanine deoxyribonucleotide, or both. In non-limiting examples, the damaged DNA 
molecule comprises a nick or a double stranded break, or both. In another specific 
embodiment, the providing step is further defined as providing repair to the damaged DNA 
molecule. In an additional specific embodiment, the label comprises a radionuclide, an 
affinity tag, a hapten, an enzyme, a chromophore, or a fluorophore. Factors causing DNA 
breaks in vivo include (ionizing) radiation, heat, UV light, oxygen, radicals, nitric oxide 
(NO), catecholamine, and/or apoptosis (nucleases). Factors causing DNA breaks in vitro 
include (ionizing) radiation, UV light, oxygen, radicals, metal ions, nucleases, 
mechanical/hydrodynamic forces, and/or chemical reagents. 

II. DNA Sequencing 

[0132] The present invention is directed to methods for preparing DNA molecules 
for DNA sequencing, particularly following amplification. A skilled artisan recognizes that 
the following methods are suitable for sequencing subsequent to generation of templates 
using methods described herein. 

A. Maxam-Gilbert method 

[0133] The Maxam-Gilbert method involves degrading DNA at a specific base 
using chemical reagents. The DNA strands terminating at a particular base are denatured and 
electrophoresed to determine the positions of the particular base. The Maxam-Gilbert 
method involves dangerous chemicals, and is time- and labor- intensive. It is no longer used 
for most applications. 

B. Sanger method 

[0134] The Sanger sequencing method is currently the most popular format for 
sequencing. It employs single-stranded DNA (ssDNA) created using special viruses like 
M13 or by denaturing double-stranded DNA (dsDNA). An oligonucleotide sequencing 
primer is hybridized to a unique site of the ssDNA and a DNA polymerase is used to 
synthesize a new strand complementary to the original strand using all four 
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deoxyribonucleotide triphosphates (dATP, dCTP, dGTP, and dTTP) and small amounts of 
one or more dideoxyribonucleotide triphosphates (ddATP, ddCTP, ddGTP, and/or ddTTP), 
which cause termination of synthesis. The DNA is denatured and electrophoresed into a 
"ladder" of bands representing the distance of the termination site from the 5'. end of the 
primer. If only one ddNTP (e.g., ddGTP) is used only those molecules that end with guanine 
will be detected in the ladder. By using ddNTPs with four different labels all four ddNTPs 
can be incorporated in the same polymerization reaction and the molecules ending with each 
of the four bases can be separately detected after electrophoresis in order to read the base 
sequence. 

[0135] Although a variety of polymerases may be used, the use of a modified T7 
DNA polymerase (Sequenase™) was a significant improvement over the original Sanger 
method (Sambrook et al t 1988; Hunkapiller, 1991). T7 DNA polymerase does not have any 
inherent 5'-3' exonuclease activity and has a reduced selectivity against incorporation of 
ddNTP. However, the 3'-5' exonuclease activity leads to degradation of some of the 
oligonucleotide primers. Sequenase™ is a chemically-modified T7 DNA polymerase that has 
reduced 3' to 5' exonuclease activity (Tabor et ah, 1987). Sequenase™ version 2.0 is a 
genetically engineered form of the T7 polymerase which completely lacks 3' to 5' 
exonuclease activity. Sequenase™ has a very high processivity and high rate of 
polymerization. It can efficiently incorporate nucleotide analogs such as dITP and 7-deaza- 
dGTP which are used to resolve regions of compression in sequencing gels. In regions of 
DNA containing a high G+C content, Hoogsteen bond formation can occur which leads to 
compressions in the DNA. These compressions result in aberrant migration patterns of 
oligonucleotide strands on sequencing gels. Because these base analogs pair weakly with 
conventional nucleotides, intrastrand secondary structures during electrophoresis are 
alleviated. In contrast, Klenow does not incorporate these analogs as efficiently. 

[0100] The use of Tag DNA polymerase and mutants thereof is a more recent 
addition to the improvements of the Sanger method (U.S. Patent No. 5,075, 216). Tag 
polymerase is a thermostable enzyme which works efficiently at 70-75°C. The ability to 
catalyze DNA synthesis at elevated temperature makes Tag polymerase useful for sequencing 
templates which have extensive secondary structures at 37°C (the standard temperature used 
for Klenow and Sequenase™ reactions). Tag polymerase, like Sequenase™, has a high 
degree of processivity and like Sequenase 2.0, it lacks 3' to 5' nuclease activity. The thermal 
stability of Tag and related enzymes (such as Tth and Thermosequenase™) provides an 
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advantage over T7 polymerase (and all mutants thereof) in that these thermally stable 
enzymes can be used for cycle sequencing which amplifies the DNA during the sequencing 
reaction, thus allowing sequencing to be performed on smaller amounts of DNA. 
Optimization of the use of Taq in the standard Sanger Method has focused on modifying Taq 
to eliminate the intrinsic 5'-3' exonuclease activity and to increase its ability to incorporate 
ddNTPs to reduce incorrect termination due to secondary structure in the single-stranded 
template DNA (EP 0 655 506 Bl). The introduction of fluorescently labeled nucleotides has 
further allowed the introduction of automated sequencing, which increases productivity. 

[0137] Sequencing DNA that is flanked by vector or PCR™ primer DNA of 
known sequence, can undergo Sanger termination reactions initiated from one end using a 
primer complementary to those known sequences. These sequencing primers are 
inexpensive, because the same primers can be used for DNA cloned into the same vector or 
PCR™ amplified using primers with common terminal sequences. Commonly-used 
electrophoretic techniques for separating the dideoxyribonucleotide-terminated DNA 
molecules are limited to resolving sequencing ladders shorter than 500 - 1000 bases. 
Therefore only the first 500 - 1000 nucleic acid bases can be "read" by this or any other 
method of sequencing the DNA. Sequencing DNA beyond the first 500 - 1000 bases 
requires special techniques. 

C. Other base-specific termination methods 

[0138] Other termination reactions have been proposed. One group of proposals 
involves substituting thiolated or boronated base analogs that resist exonuclease activity. 
After incorporation reactions very similar to Sanger reactions a 3' to 5' exonuclease is used 
to resect the synthesized strand to the point of the last base analog. These methods have no 
substantial advantage over the Sanger method. 

[0139] Methods have been proposed to reduce the number of electrophoretic 
separations required to sequence large amounts of DNA. These include multiplex sequencing 
of large numbers of different molecules on the same electrophoretic device, by attaching 
unique tags to different molecules so that they can be separately detected. Commonly, 
different fluorescent dyes are used to multiplex up to 4 different types of DNA molecules in a 
single electrophoretic lane or capillary (U.S. Patent No. 4,942,124). Less commonly, the 
DNA is tagged with large number of different nucleic acid sequences during cloning or 
PCR™ amplification, and detected by hybridization (U.S. Patent No. 4,942,124) or by mass 
spectrometry (U.S. Patent No. 4,942,124). 
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[0140] In principle, the sequence of a short fragment can be read by hybridizing 
different oligonucleotides to the unknown sequence and deciphering the information to 
reconstruct the sequence. This "sequencing by hybridization" is limited to fragments of DNA 
< 50 bp in length. It is difficult to amplify such short pieces of DNA for sequencing. 
However, even if sequencing many random 50 bp pieces were possible, assembling the short, 
sometimes overlapping sequences into the complete sequence of a large piece of DNA would 
he impossible. The use of sequencing by hybridization is currently limited to re-sequencing, 
that is, testing the sequence of regions that have already been sequenced. 

D. Preparing DNA for determining long sequences 

[0141] Because it is currently very difficult to separate DNA molecules longer 
than 1000 bases with single-base resolution, special methods have been devised to sequence 
DNA regions within larger DNA molecules. The "primer walking" method initiates the 
Sanger reaction at sequence-specific sites within long DNA. However, most emphasis is on 
methods to amplify DNA in such a way that one of the ends originates from a specific 
position within the long DNA molecule. 

1 . Primer walking 

[0142] Once part of a sequence has been determined (e.g., the terminal 500 
bases), a custom sequencing primer can be made that is complementary to the known part of 
the sequence, and used to prime a Sanger dideoxyribonucleotide termination reaction that 
extends further into the unknown region of the DNA. This procedure is called "primer 
walking." The requirement to synthesize a new oligonucleotide every 400 - 1000 bp makes 
this method expensive. The method is slow, because each step is done in series rather than in 
parallel. In addition, each new primer has a significant failure rate until optimum conditions 
are determined. Primer walking is primarily used to fill gaps in the sequence that have not 
been read after shotgun sequencing or to complete the sequencing of small DNA fragments 
<5,000 bp in length. However, WO 00/60121 addresses this problem using a single synthetic 
primer for PCR™ to genome walk to unknown sequences from a known sequence. The 5 - 
blocked primer anneals to the denatured template and is extended, followed by coupling to 
the extended product of a 3 '-blocked oligonucleotide of known sequence, thereby creating a 
single stranded molecule having had only a single region of known target DNA sequence. By 
sequencing an amplified product from the extended product having the coupled 3 -blocked 
oligonucleotide, the process can be applied reitefatively to elucidate consecutive adjacent 
unknown sequences. 
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2. PCR ,M amplification 

[0143] PCR™ can be used to amplify a specific region within a large DNA 
molecule. Because the PCR™ primers must be complementary to the DNA flanking the 
specific region, this method is usually used only to prepare DNA to "re-sequence" a region of 
DNA. 

3. Nested deletion and transposon insertion 

[0144] As described above, cloning or PCR™ amplification of long DNA with 
nested deletions brought about by nuclease cleavage or transposon insertion enables ordered 
libraries of DNA to be created. When exonuclease is used to progressively digest one end of 
the DNA there is some control over the position of one end of the molecule. However the 
exonuclease activity cannot be controlled to give a narrow distribution in molecular weights, 
so typically the exonuclease-treated DNA is separated by electrophoresis to better select the 
position of the end of the DNA samples before cloning. Because transposon insertion is 
nearly random, clones containing inserted elements have to be screened before choosing 
which clones have the insertion at a specific internal site. The labor-intense steps of clone 
screening make these methods impractical except for DNA less than about 10 kb long. 

4. Junction-fragment DNA probes for preparing ordered DNA clones 

[0145] Collins and Weissman have proposed to use "junction-fragment DNA 
probes and probe clusters" (U.S. Patent No. 4,710,465) to fractionate large regions of 
chromosomes into ordered libraries of clones. That patent proposes to size fractionate 
genomic DNA fragments after partial restriction digestion, circularize the fragments in each 
size-fraction to form junctions between sequences separated by different physical distances in 
the genome, and then clone the junctions in each size fraction. By screening all the clones 
derived from each size-fraction using a hybridization probe from a known sequence, ordered 
libraries of clones could be created having sequences located different distances from the 
known sequence. Although this method was designed to walk megabase distances along 
chromosomes, it was never put into practical use because of the necessity to maintain and 
screen hundreds of thousands of clones from each size fraction. In addition, cross 
hybridization would be expected to yield a large fraction of false positive clones. 

5. Shotgun cloning 

[0146] The only practical method for preparing DNA longer than 5-20 kb for 
sequencing is subcloning the source DNA as random fragments small enough to be 
sequenced. The large source DNA molecule is fragmented by sonication or hydrodynamic 
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shearing, fractionated to select the optimum fragment size, and then subcloned into a 
bacterial plasmid or virus genome (Adams et al. 9 1994; Primrose, 1998; Cantor and Smith, 
1999). The individual subclones can be subjected to Sanger or other sequencing reactions in 
order to determine sequences within the source DNA. If many overlapping subclones are 
sequenced, the entire sequence for the large source DNA can be determined. The advantages 
of shotgun cloning over the other techniques are: 1) the fragments are small and uniform in 
size so that they can be cloned with high efficiency independent of sequence; 2) the 
fragments can be short enough that both strands can be sequenced using the Sanger reaction; 
3) transformation and growth of many clones is rapid and inexpensive; and 4) clones are very 
stable 

£. Genomic sequencing 

[0147] Current techniques to sequence genomes (as well as any DNA larger than 
about 5 kb) depend upon shotgun cloning of small random fragments from the entire DNA. 
Bacteria and other very small genomes can be directly shotgun cloned and sequenced. This is 
called "pure shotgun sequencing." Larger genomes are usually first cloned as large pieces 
and each clone is shotgun sequenced. This is called "directed shotgun sequencing." 
1. Pure shotgun sequencing 

[0148] Genomes up to several millions or billions of base pairs in length can be 
randomly fragmented and subcloned as small fragments (Adams et al. y 1994; Primrose, 1998; 
Cantor and Smith, 1999). However, in the process of fragmentation all information about the 
relative positions of the fragment sequences in the native genome is lost. This information 
can be recovered by sequencing with 5 - 10-fold redundancy (Le. 9 the number of bases 
sequenced in different reactions add up to 5 to 10 times as many bases in the genome) so as 
to generate sufficiently numerous overlaps between the sequences of different fragments that 
a computer program can assemble the sequences from the subclones into large contiguous 
sequences (contig;s). However, due to some regions being more difficult to clone than others 
*nd due to incomplete statistical sampling, there will still be some regions within the genome 
that are not sequenced even after highly redundant sequencing. These unknown regions are 
called "gaps." After assembly of the shotgun sequences into contigs, the sequencing is 
"finished" by filling in the gaps. Finishing must be done by additional sequencing of the 
subclones, by primer walking beginning at the edge of a contig, or by sequencing PCR™ 
products made using primers from the edges of adjacent contigs. 
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[01491 There are several disadvantages to the pure shotgun strategy: 1) as the size 
of the region to be sequenced increases, the effort of assembling a contiguous sequence from 
shotgun reads increases faster than N InN, where N is the number of reads; 2) repetitive DNA 
and sequencing errors can cause ambiguities in sequence assembly; and 3) because subclones 
from the entire genome are sequenced at the same time and significant redundancy of 
sequencing is necessary to get contigs of moderate size, about 50% of the sequencing has to 
be finished before the sequence accuracy and the contig sizes are sufficient to get substantial 
information about the genome. Focusing the sequencing effort on one region is impossible. 
2. Directed shotgun sequencing 

[0150] The directed shotgun strategy, adopted by the Human Genome Project, 
reduces the difficulty of sequence assembly by limiting the analysis to one large clone at a 
time. This "clone-by-clone" approach requires four steps 1) large-insert cloning, comprised 
of a) random fragmentation of the genome into segments 100,000 - 300,000 bp in size, b) 
cloning of the large segments, and c) isolation, selection and mapping of the clones; 2) 
random fragmentation and subcloning of each clone as thousands of short subclones; 3) 
sequencing random subclones and assembly of the overlapping sequences into contiguous 
regions; and 4) "finishing" the sequence by filling the gaps between contiguous regions and 
resolving inaccuracies. The positions of the sequences of the large clones within the genome 
are determined by the mapping steps, and the positions of the sequences of the subclones are 
determined by redundant sequencing of the subclones and computer assembly of the 
sequences of individual large clones. Substantial initial investment of resources and time are 
required for the first two steps before sequencing begins. This inhibits sequencing DNA 
from different species or individuals. Sequencing random subclones is highly inefficient, 
because significant gaps exist until the subclones have been sequenced to about 7X 
redundancy. Finishing requires "smart" workers and effort equivalent to an additional ~ 3X 
sequencing redundancy. 

[0151] The directed shotgun sequencing method is more likely to finish a large 
genome than is pure shotgun sequencing. For the human genome, for example, the computer 
effort for directed shotgun sequencing is more than 20 times less than that required for pure 
shotgun sequencing. 

[0152] There is an even greater need to simplify the sequencing and finishing 
steps of genomic sequencing. In principle, this can be done by creating ordered libraries of 
DNA, giving uniform (rather than random) coverage, which would allow accurate sequencing 
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with only about 3 fold redundancy and eliminate the finishing phase of projects. Current 
methods to produce ordered libraries are impractical, because they can cover only short 
regions (~ 5,000 bp) and are labor-intensive. 
F. Resequencing of DNA 

(0153] The presence of a known DNA sequence or variation of a known sequence 
can be detected using a variety of techniques that are more rapid and less expensive than de 
novo sequencing. These "re-sequencing" techniques are important for health applications, 
where determination of which allele or alleles are present has prognostic and diagnostic 
value. 

1. Microarray detection of specific DNA sequences 

[0154] The DNA from an individual human or animal is amplified, usually by 
PCR™, labeled with a detectable tag, and hybridized to spots of DNA with known sequences 
bound to a surface (Primrose, 1998; Cantor and Smith, 1999). If the individual's DNA 
contains sequences that are complementary to those on one or more spots on the DNA array, 
the tagged molecules are physically detected. If the individual's amplified DNA is not 
complementary to the probe DNA in a spot, the tagged molecules are not detected. 
Microarrays of different design have different sensitivities to the amount of tested DNA and 
the exact amount of sequence complementarity that is required for a positive result. The 
advantage of the microarray resequencing technique is that many regions of an individual's 
DNA can be simultaneously amplified using multiplex PCR™, and the mixture of amplified 
genetic elements hybridized simultaneously to a microarray having thousands of different 
probe spots, such that variations at many different sites can be simultaneously detected. 

[0155] One disadvantage to using PCR™ to amplify the DNA is that only one 
genetic element can be amplified in each reaction, unless multiplex PCR™ is employed, in 
which case only as many as 10-50 loci can be simultaneously amplified. For certain 
applications, such as SNP (single nucleotide polymorphism) screening, it would be 
advantageous to simultaneously amplify 1,000 - 100,000 elements and detect the amplified 
sequences simultaneously. A second disadvantage to PCR™ is that only a limited number of 
DNA bases can be amplified from each element (usually <2000 bp). Many applications 
require re-sequencing entire genes, which can be up to 200,000 bp in length. 

2. Other methods of re-sequencing 

[0156] Other methods such as mass spectrometry, secondary structure 
conformation polymorphism, ligation amplification, primer extension, and target-dependent 
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cleavage can be used to detect sequence polymorphisms. All these methods either require 
initial amplification of one or more specific genetic elements by PCR™ or incorporate other 
forms of amplification that have the same deficiencies of PCR™, because they can amplify 
only a very limited region of the genome at one time. 

III. Amplification of Nucleic Acids 

[0157] Nucleic acids useful as templates for amplification may be isolated from 
cells, tissues or other samples according to standard methodologies (Sambrook et al y 1989). 
In certain embodiments, analysis is performed on whole cell or tissue homogenates or 
biological fluid samples without substantial purification of the template nucleic acid. The 
nucleic acid can be genomic DNA or fractionated or whole cell RNA. Where RNA is used, it 
may be desired to first convert the RNA to a complementary DNA. 

[0158] The term "primer," as used herein, is meant to encompass any nucleic acid 
that is capable of priming the synthesis of a nascent nucleic acid in a template-dependent 
process. Typically, primers are oligonucleotides from ten to twenty and/or thirty base pairs in 
length, but longer sequences can be employed. Primers may be provided in double-stranded 
and/or single-stranded form, although the single-stranded form is preferred. 

[0159] Pairs of primers designed to selectively hybridize to nucleic acids are 
contacted with the template nucleic acid under conditions that permit selective hybridization. 
Depending upon the desired application, high stringency hybridization conditions may be 
selected that will only allow hybridization to sequences that are completely complementary to 
the primers. In other embodiments, hybridization may occur under reduced stringency to 
allow for amplification of nucleic acids containing one or more mismatches with the primer 
sequences. Once hybridized, the template-primer complex is contacted with one or more 
enzymes that facilitate template-dependent nucleic acid synthesis. Multiple rounds of 
amplification, also referred to as "cycles," are conducted until a sufficient amount of 
amplification product is produced. 

[0160] The amplification product may be detected or quantified. In certain 
applications, the detection may be performed by visual means. Alternatively, the detection 
may involve indirect identification of the product via chemiluminescence, radioactive 
scintigraphy of incorporated radiolabel or fluorescent label or even via a system using 
electrical and/or thermal impulse signals (Affymax technology). 
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[0161] A number of template dependent processes are available to amplify the 
oligonucleotide sequences present in a given template sample. One of the best known 
amplification methods is the polymerase chain reaction (referred to as PCR™) which is 
described in detail in U.S. Patent Nos. 4,683,195, 4,683,202 and 4,800,159, and in Innis et 
al. 9 1990, each of which is incorporated herein by reference in their entirety. Briefly, two 
synthetic oligonucleotide primers, which are complementary to two regions of the template 
DNA (one for each strand) to be amplified, are added to the template DNA (that need not be 
pure), in the presence of excess deoxynucleotides (dNTP's) and a thermostable polymerase, 
such as, for example, Taq {Thermus aquaticus) DNA polymerase. In a series (typically 30- 
35) of temperature cycles, the target DNA is repeatedly denatured (around 90°C), annealed to 
the primers (typically at 50-60°C) and a daughter strand extended from the primers (72°C). 
As the daughter strands are created they act as templates in subsequent cycles. Thus, the 
template region between the two primers is amplified exponentially, rather than linearly. 

[0162] A reverse transcriptase PCR™ amplification procedure may be performed 
to quantify the amount of mRNA amplified. Methods of reverse transcribing RNA into 
cDNA are well known and described in Sambrook et aL, 1989. Alternative methods for 
reverse transcription utilize thermostable DNA polymerases. These methods are described in 
WO 90/07641. Polymerase chain reaction methodologies are well known in the art. 
Representative methods of RT-PCR™ are described in U.S. Patent No. 5,882,864. 

A. LCR 

[0163] Another method for amplification is the ligase chain reaction ("LCR"), 
disclosed in European Patent Application No. 320,308, incorporated herein by reference. In 
LCR, two complementary probe pairs are prepared, and in the presence of the target 
sequence, each pair will bind to opposite complementary strands of the target such that they 
abut. In the presence of a ligase, the two probe pairs will link to form a single unit. By 
temperature cycling, as in PCR™, bound ligated units dissociate from the target and then 
serve as "target sequences" for ligation of excess probe pairs. U.S. Patent 4,883,750, 
incorporated herein by reference, describes a method similar to LCR for binding probe pairs 
to a target sequence. 

B. Qbeta Replicase 

[0164] Qbeta Replicase, described in PCT Patent Application No. 
PC17US87/00880, also may be used as still another amplification method in the present 
invention. In this method, a replicative sequence of RNA which has a region complementary 
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to that of a target is added to a sample in the presence of an RNA polymerase. The 
polymerase will copy the replicative sequence which can then be detected. 

C. Isothermal Amplification 

[0165] An isothermal amplification method, in which restriction endonucleases 
and ligases are used to achieve the amplification of target molecules that contain nucleotide 
thiophosphates in one strand of a restriction site also may be useful in the amplification of 
nucleic acids in the present invention. Such an amplification method is described by Walker 
et ah 1992, incorporated herein by reference. 

D. Strand Displacement Amplification 

[0166] Strand Displacement Amplification (SDA) is another method of carrying 
out isothermal amplification of nucleic acids which involves multiple rounds of strand 
displacement and synthesis, i.e. y nick translation. A similar method, called Repair Chain 
Reaction (RCR), involves annealing several probes throughout a region targeted for 
amplification, followed by a repair reaction in which only two of the four bases are present. 
The other two bases can be added as biotinylated derivatives for easy detection. A similar 
approach is used in SDA. 

E. Cyclic Probe Reaction 

[0167] Target specific sequences can also be detected using a cyclic probe 
reaction (CPR). In CPR, a probe having 3' and 5' sequences of non-specific DNA and a 
middle sequence of specific RNA is hybridized to DNA which is present in a sample. Upon 
hybridization, the reaction is treated with RNase H, and the products of the probe identified 
as distinctive products which are released after digestion. The original template is annealed 
to another cycling probe and the reaction is repeated. 

F. Transcription-Based Amplification 

[0168] Other nucleic acid amplification procedures include transcription-based 
amplification systems (TAS), including nucleic acid sequence based amplification (NASBA) 
and 3SR, Kwoh et al f 1989; PCT Patent Application WO 88/10315 et al f 1989, each 
incorporated herein by reference). 

[0169] In NASBA, the nucleic acids can be prepared for amplification by standard 
phenol/chloroform extraction, heat denaturation of a clinical sample, treatment with lysis 
buffer and minispin columns for isolation of DNA and RNA or guanidinium chloride 
extraction of RNA. These amplification techniques involve annealing a primer which has 
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target specific sequences. Following polymerization, DNA/RNA hybrids are digested with 
RNase H while double stranded DNA molecules are heat denatured again. In either case the 
single stranded DNA is made fully double stranded by addition of second target specific 
primer, followed by polymerization. The double-stranded DNA molecules are then multiply 
transcribed by an RNA polymerase, such as T7 or SP6. In an isothermal cyclic reaction, the 
RNAs are reverse transcribed into double stranded DNA, and transcribed once again with an 
RNA polymerase, such as T7 or SP6. The resulting products, whether truncated or complete, 
indicate target specific sequences. 

G. Rolling Circle Amplification 

[0170] Rolling circle amplification (U.S. Patent No. 5,648,245) is a method to 
increase the effectiveness of the strand displacement reaction by using a circular template. 
The polymerase, which does not have a 5' exonuclease activity, makes multiple copies of the 
information on the circular template as it makes multiple continuous cycles around the 
template. The length of the product is very large— typically too large to be directly 
sequenced. Additional amplification is achieved if a second strand displacement primer is 
added to the reaction using the first strand displacement product as a template. 

EL Other Amplification Methods 

[0171] Other amplification methods, as described in British Patent Application 
No. GB 2,202,328, and in PCT Patent Application No. PCT/US89/01025, each incorporated 
herein by reference, may be used in accordance with the present invention. In the former 
application, "modified" primers are used in a PCR™ like, template and enzyme dependent 
synthesis. The primers may be modified by labeling with a capture moiety (e.g., biotin) 
and/or a detector moiety (e.g., enzyme). In the latter application, an excess of labeled probes 
are added to a sample. In the presence of the target sequence, the probe binds and is cleaved 
catalytically. After cleavage, the target sequence is released intact to be bound by excess 
probe. Cleavage of the labeled probe signals the presence of the target sequence. 

[01721 Miller et al, PCT Patent Application WO 89/06700 (incorporated herein 
by reference) disclose a nucleic acid sequence amplification scheme based on the 
hybridization of a promoter/primer sequence to a target single-stranded DNA ("ssDNA") 
followed by transcription of many RNA copies of the sequence. This scheme is not cyclic, 
i.e., new templates are not produced from the resultant RNA transcripts. 
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[0173] Other suitable amplification methods include "RACE" and "one-sided 
?CR™" (Frohman, 1990; Ohara et al„ 1989, each herein incorporated by reference). 
Methods based on ligation of two (or more) oligonucleotides in the presence of nucleic acid 
having the sequence of the resulting "di-oligonucleotide", thereby amplifying the 
di-oligonucleotide, also may be used in the amplification step of the present invention, Wu et 
al, 1989, incorporated herein by reference). 

EXAMPLES 

[0174J The following examples are included to demonstrate preferred 
embodiments of the invention. It should be appreciated by those of skill in the ait that the 
techniques disclosed in the examples which follow represent techniques discovered by the 
inventor to function well in the practice of the invention, and thus can be considered to 
constitute preferred modes for its practice. However, those of skill in the art should, in light 
of the present disclosure, appreciate that many changes can be made in the specific 
Embodiments which are disclosed and still obtain a like or similar result without departing 
from the spirit and scope of the invention. 

EXAMPLE 1: PREPARATION OF TRF LIBRARY FROM E. COLI GENOMIC DNA 

BY HYDRODYNAMIC SHEARING 

[0175] This example describes the preparation of TRF library of average size of 3 

Kb from E. coli genomic DNA, particularly by hydrodynamic shearing (HydroShear device, 

GeneMachines) and terminal transferase mediated tailing with deoxyguanosine triphosphate 

(dGTP). 

[0176] The prepared library allows reproducible amplification of many nested 
DNA mixtures using one sequence-specific primer and universal homopolymeric primer Cio 
(containing ten cytosines). Sequencing of these mixtures using the same primer generates 
600 to 800 base reads adjacent to chosen kernel primers. 

(0177] DNA is isolated by standard purification from E. coli 9 such as strain 
MG1655 (purchased from Yale University), and diluted to 100 ng/^1 in TE-L buffer (10 mM 
Tris-HCl, 0.1 mM EDTA, pH 7.5). The sample is incubated at 45°C for 15 min. During the 
course of the incubation the DNA sample is vortexed at maximum speed for 30 sec every 3 
min. The sample is then centrifuged at 16,000 x g for 15 min at room temperature. The 
supernatant is slowly aspirated and transferred to a clean tube sacrificing the last 30 
microliters. 
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[0178] Aliquots of 150 pil of the DNA prep are subjected to mechanical 
fragmentation on a HydroShear device (Gene Machines) for 20 passes at a speed code of 9 
following the manufacturer's protocol. The sheared DNA has an average size of about 3 kb 
as predicted by the manufacturer and confirmed by gel electrophoresis. To prevent DNA 
carry-over contamination, the shearing assembly of the HydroShear is washed 3 times each 
with 0.2 M HC1, 0.2 M NaOH, and 5 times with TE-L buffer prior to and after fragmentation. 
All solutions are 0.2 jxm filtered before use. 

[0179] Homopolymeric G tails, consisting of about 10 to 15 nucleotides, are 
enzymatically added to the 3' -termini of the DNA fragments by terminal deoxynucleotidyl 
transferase. DNA template at 80 ng/jxl is incubated with 10 units of New England Biolabs 
(NEB) terminal transferase in lx NEB restriction buffer # 4 containing 0.25 mM C0CI2, and 
2 |iM dGTP in a final volume of 50 \x\ for 15 min at 37 °C. The reaction is stopped by adding 
5 |d of 0.5 M EDTA, pH 8.0. The sample is supplemented with 1/10 volume of 3 M sodium 
acetate, pH 5.0, precipitated with 2.5 volumes of ethanol in the presence of 2 jig glycogen, 
centrifuged 30 min at 16,000xg, and the pellet was then washed twice with 70% ethanol at 
room temperature and dissolved in TE-L buffer. 

EXAMPLE 2: AMPLIFICATION AND SEQUENCING OF E. COLI DNA REGIONS 
WITH SPECIFIC PRIMERS FROM TRF LIBRARY PREPARED BY 
HYDRODYNAMIC SHEARING 

[0180] DNA AMPLIFICATION AND SEQUENCING USING DNA 
MOLECULES GENERATED BY RANDOM FRAGMENTATION This example describes 
amplification and sequencing of specific regions from an E. coli TRF library. During PGR™ 
amplification a specific primer is used along with a 10 base homopolymeric cytosine primer 
(Cio primer). The resulting amplicon is then utilized as template for cycle sequencing with 
the same specific primer used in the PCR™. 

[0181] Amplification primers are designed using Oligo version 6.53 primer 
analysis software (Molecular Biology Insights, Inc., Cascade, CO) Primers are 21 to 23 bases 
long, having high internal stability, low 3'-end stability, and melting temperatures of 57 to 
62°C (at 50 mM salt and 2 mM MgCb)- Primers are designed to meet all standard criteria, 
such as low primer-dimer and hairpin formation, and are filtered against an E. coli genomic 
5-mer frequency database. 
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[0182] For the purposes of non-limiting illustration, oligonucleotides for PCR™ 
amplifications are designed to target amplicons of six specific regions of the E. coli DNA: 
primers SI , S3, S7, S3 1 , S36, and S41 (Table I). 

Table I. Primers used for Positional Amplification and Sequencing off. coli Genomic 
Regions, Human tp53 Gene Regions and Corn Genomic Regions from TRF Libraries 



Primer* ID 

SI 

S3 
S7 
S31 



S36 



S41 



T4 



T5 



HS3+ 



HS4+ 



Sequence (5*-3') 

ATG TGG CGC GTA AAC TAT TGA 
(SEQIDNO:!) 



CTG GCG GG A GTG AGT AGC AA 
(SEQIDNO:2) 



TTC AAC TGG CGC AGG GCT AT 
(SEQ ID NO:3) 



TCT GCC AGC GCC CGT AAC AA 
(SEQIDNO:4) 



CCA GCG CAT TCT GAC TAA ACC 
(SEQIDNO:5) 



TCG CCC ATC TTC TCA CGT AG 
(SEQ ID NO:6) 



GGT AGC CGT TGA GTC ACC CTC 
(SEQIDNO:7) 

GCC GCA ATC AAT ACG ACC TGT 
(SEQ ID NO:8) 



Application 

primary amplification of target 
region at contig 1 of E. coli 
genome 

primary amplification of target 
region at contig 2 of E. coli 
genome 

primary amplification of target 
region at contig 4 of E. coli 
genome 

primary amplification of target 
region at contig 12 of E. coli 
genome 

primary amplification of target 
region at contig 13 of E. coli 
genome 

primary amplification of target 
region at contig 14 of E. coli 
Genome 

walking primer for S3 amplicon 
645bp apart from S3 

m 

walking primer for S3 amplicon 
1272bp apart from S3 



AGA AAA GCT CCT GAG GTG TAG AC primary amplification of target 



HB7- 



(SEQ ID NO:9) 



CTC ATC TTG GGC CTG TGT TAT CT 
(SEQ ID NO: 10) 



CTG GGC CAG CAA GAC TTG AC A AC 
(SEQ ID NO: 11) 



region encompassing exons 5, 6, 
and 7 of the human tp53 gene 

primary amplification of target 
region at exons 7, 8, and 9 of the 
human tp53 gene, also nested for 
priming site HS3+ 

primary amplification of target 
region at exon 1 1 of the human 
tp53 gene 
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HS2+ 

HS14177+ 

HB8- 

asg60.sl 133+ 
asg60.sl 405+ 
ZeaX254- 
ZeaX211- 

ZeaX 149- 
ZeaX49- 
MubGl 218- 

MubGl 317- 

MubGl 356- 
MubGl 24- 

MubGl 393+ 
MubGl 395+ 
MubGl 428+ 
MubGl 430+ 



GAT CGA GAC CAT CCT GGC TAA CGG 
(SEQIDNO:12) 

TGG GCC CAC CTC TTA CCG ATT TCT 
(SEQ ID NO: 13) 

AGC TGC CCA ACT GTA GAA ACT AC 
(SEQIDNO:14) 

TAG TGT GCC CAG TGG TTA TAT TG 
(SEQIDNO:15) 

GCC GTC CGA TGA GAT CAC TGT AG 
(SEQ ID NO. 16) 

TCT CAA GTG GTC CGC TAT TAT TC 
(SEQIDNO:17) 

GCC CGC GCA AGC CAT CCA TAG AG 
(SEQIDNO:18) 



ACC GAA TCC TCC TGC CGC AAA GT 
(SEQIDNO:19) 

CTA AAA GTC CAT AAC GGG ATG AC 
(SEQIDNO:20) 

TGA CAC AAC GGC TAC GAT TTA AT 
(SEQK>NO:21) 



GCC GCC GGA TTC AGC TAA ATT GT 
(SEQIDNO:22) 



CAC GAC CGG GTC ACG CTG CAC TG 
(SEQK>NO:23) 

GGC CGG GAC CGT TGA ACT AGA AC 
(SEQIDNO:24) 



TTT GGC CAT GAG TCG TGA CTT AG 
(SEQ ID NO:25) 

TGG CCA TGA GTC GTG ACT TAG TT 
(SEQ ID NO:26) 

GAC CGG TTC TCC TAG CTT GTT 
(SEQ ID NO:27) 

CCG GTT CTC CTA GCT TGT TCT AC 
(SEQ ID NO:28) 



nested for priming site HS3+ 



nested for priming site HS4+ 



nested for priming site HB7- 



primary amplification of com 
region 1 

nested amplification of corn 
region 1 

primary amplification of com 
region 2 

primary amplification of com 
region 2 and nested for priming 
site Zea X 254- 

nested amplification of com 
region 2 

nested amplification of com 
region 2 

primary amplification of com 
region 3 and nested for priming 
site MubGl 356- 

primary amplification of com 
region 3 and nested for priming 
site MubGl 356- 

primary amplification of com 
region 3 

nested amplification of com 
region 3 at priming site MubGl 
218- 

primary amplification of com 
region 4 

primary amplification of com 
region 4 

nested amplification of com 
region 4 

nested amplification of com 
region 4 
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*A11 primers are synthesized and purified by HPSF at MWG Biotech 

[0183] PCR™ amplification is carried out with 200 nM specific primer, 200 nM 
of universal C-10 primer, and 40 ng ofE. coli TRF library DNA (described in Example 1) in 
a final volume of 25 jil under standard Titanium Tag Polymerase conditions (Clontech). 
After initial denaturation at 94°C for 2 min, samples are subjected to 32 cycles at 94°C for 10 
sec, 68°C for 2 min and 15 sec, and a final extension at 72°C for 2 min. Control reactions are 
performed under the same conditions with 200 nM of C-10 primer alone. Aliquots of 12 p.1 
of each PCR™ reaction are analyzed by electrophoresis on a 1% agarose gel (FIG. 8 and 
FIG. 9). As shown, a specific discrete band is amplified from fragmented non-tailed DNA 
(FIG. 9), whereas a uniform smear is obtained when TRF library DNA is used as the 
template. This smear reflects the random process of fragmentation. 

[0184] The PCR™ amplification products are quantified from the stained gel by 
comparison with standard DNA markers using the volume quantitation tool of Fluor-S 
Imager software (Bio Rad). The PCR™ products are purified free of primers and nucleotides 
by the QIAquick PCR™ purification kit (Qiagen), eluted in 30 |xl of 1 mM Tris-HCl, pH 7.5 
2nd used as template for cycle sequencing with the same primes used for PCR™. 

[0185] Cycle sequencing is performed by mixing 2 to 11 pi of sequencing 
template, containing 40 to 250 ng of total DNA, with 1 (4.1 of 5 pM each sequencing primer 
and 8 jil of DYEnamic ET terminator reagent mix (Amersham Pharmacia Biotech) in 96 well 
plates in final volume of 20 |xl. Amplification is performed for 30 cycles at: 94°C for 20 sec, 
58°C for 15 sec, and 60°C for 75 sec. Samples are precipitated with 70% ethanol and 
analyzed on a MegaBACE 1000 capillary electrophoresis sequencing system (Amersham 
Pharmacia Biotech) using the manufacturer's protocol. 

[0186] Table II shows a summary of the sequencing results obtained from the six 
regions of the E. coli genome. 

Table II. Summary of the Sequencing Results for Specific Regions of the E. coli 
Genome and Human tp53 Gene Amplified from TRF Libraries Prepared by 
Hydrodynamic Shearing 

Sequenced Read Length at Accuracy of the Read 
Region* Phred >20 ** (% match with published sequence) 

E. coli Genomic 

SI Region 387 99% 
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S3 Region 720+/- 36 99% 

S7 Region 665 +/- 29 99% 

S3 1 Region 736+/- 22 99% 

S36 Region 618+/- 26 99% 

S41 Region 433+/- 71 99% 

T4 Region 574 +/- 38 98 % 

T5 Region 404 98% 

Human tp53 

Region 1 705 +/- 59 98% 

(exons 6,7, 8) 

Region 2 683 +/- 64 98 % 

(exons 7,8, 9) 

Region 3 267 n/a 99% 

(exonll) 

* Refer to FIG. 11 

**Mean +/- S.D. from multiple reads (see text) for human regions 1 and 2, bacterial 
regions S3, S7, S31, S36, S41 andT4, and single read for human region 3, and bacterial 
regions T5 and SI 

[0187] The average read length of the analyzed sequences is above 600 bases. A 
sequence is considered to be a failure if 100 or fewer bases are identifiable. Valid sequencing 
reads were constrained to a preset threshold score of >20 using the Phred algorithm (Codon 
Code Corporation, Dedham, MA), which corresponds to an error probability of 1%. 
Sequence accuracy as compared to the published E. coli K12 MG1655 sequences is equal or 
greater than 98%. 

[0188] Thus, this example demonstrates that specific genomic regions can be 
amplified and sequenced with a high level of accuracy and long read length from a TRF 
library generated from bacterial DNA by hydrodynamic shearing. 
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EXAMPLE 3: AMPLIFICATION AND SEQUENCING BY PRIMER WALKING 
WITHIN THE DN A AMPLICONS GENERATED FROM TRF LIBRARY 

[0189] This example describes the amplification and sequencing of a specific 
region from an E. coli TRF library (prepared by hydrodynamic shearing) by a primer walking 
approach. During Touch Down PCR™ (TD PCR™) amplification, the specific primer is 
used along with the universal 10-mer poly-C (Ci 0 ) primer. TD PCR™ conditions are chosen 
to increase the yield of amplified products. The resulting amplicon is then utilized as template 
for cycle sequencing with primers distal (in the 3 * direction) to the amplification primer. The 
distal, or walking, primers are typically spaced to generate overlapping sequencing reads. 
Reads are then combined to form one long, contiguous sequence. 

[0190] Primer SI is designed to target amplication of one specific region of the E. 
coli DNA amplicon SI (FIG. 1 and Table I). TD PCR™ amplification is performed with 300 
nM specific primer, 300 nM of universal Cio primer, and 40 ng of E. coli TRF library DNA 
(described in example 1) in a final volume of 25 \x\ under standard Titanium Taq Polymerase 
conditions (Clontech). After initial denaturing at 95°C for 2 min, samples are subjected to 20 
cycles at 95°C for 15 sec, 73°C for 2 min and 15 sec, with decreasing temperature of 0.5°C in 
each cycle. The next round of amplification is 25 cycles at 95°C for 15 sec and 60°C for 2 
min, with increasing time of extension of 1 sec each cycle. 

[0191] The PCR™ product is purified free of primers and nucleotides by 
QIAquick PCR™ purification kit (Qiagen), eluted in 30 \i\ of 1 mM Tris-HCl, pH 7.5 and 
used as template for cycle sequencing with more distal walking primers. 

[0192] Primers for sequencing and walking within the amplicon SI are designed 
to be 600 to 700 bp apart from initial primers used for PCR™ amplification or from each 
other (primers T4 and T5; Table I). Cycle sequencing is performed as previously described 
^Example 2). 

[0193] The analyzed genomic region (amplicon SI) is shown on FIG. 10. 
Sequencing of the first region is obtained by using SI as a sequencing primer. The results are 
presented in Example 2 (see Table 2). 

[0194] Sequencing of the second and third regions of the amplicon SI (see 
FIG. 10) is achieved by using T4 and T5 sequencing ("walking") primers, respectively. Using 
this approach, 2.2 kb are sequenced of which 1.7 kb represent high quality sequence 
information (Phred score > 20). 

[0195] Table II shows a summary of the sequencing results obtained for the three 
specific regions of E. coli genome. The average read length of the analyzed sequences is 500 
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bases at a threshold score of >20 using the Phred algorithm. Sequence accuracy as compared 
to the published E. coli K12 MG1655 sequences is 98% or greater. 

[0196] Thus, this example demonstrates the ability to ct walk" on a distance of 2kb 
within the amplicons generated from the TRF library. 

EXAMPLE 4: PREPARATION OF TRF LIBRARY FROM HUMAN GENOMIC 
DNA BY HYDRODYNAMIC SHEARING 

[0197] This example describes the preparation of TRF library of average size of 
about 3 kb from human genomic DNA by hydrodynamic shearing. 

(0198] DNA is isolated by standard purification from fresh human lymphocytes 
and diluted to 100 ng/^il in TE-L buffer (10 mM Tris-HCl, pH 7.5; 0.1 mM EDTA, pH 7.5). 
The sample is incubated at 45°C for 15 min. During the course of the incubation, the DNA 
sample is vortexed at maximum speed for 30 sec every 3 min. The sample is then centrifuged 
at 16,000 x g for 15 min at room temperature. To avoid the presence of particulate matter, 
the supernatant is slowly aspirated and transferred to a clean tube, sacrificing the last 50 
microliters. 

(0199] Aliquots of 180 \i\ of the DNA prep are subjected to mechanical 
fragmentation on a HydroShear device (Gene Machines) for 20 passes at a speed code of 9 
following the manufacturer's protocol. The sheared DNA has an average size of 3 kb as 
predicted by manufacturer and confirmed by gel electrophoresis. To prevent DNA carry-over 
contamination, the shearing assembly of the HydroShear is washed 3 times each with 0.2 M 
HC1, 0.2 M NaOH, and 5 times with TE-L buffer prior to and after fragmentation. All wash 
solutions were 0.2 jim filtered. 

[0200] Homopolymeric G tails, consisting of 10-15 nucleotides, are enzymatically 
added to the 3 '-termini of the DNA fragments by terminal deoxynucleotidyl transferase. 
Template DNA at 20 ng/^1 is incubated with 40 units of New England Biolabs (NEB) 
terminal transferase in lx NEB restriction buffer # 4, 0.25 mM C0CI2, and 2 \xM dGTP in a 
final volume of 100 |il for 20 min at 37°C. The reaction is stopped by adding 4 jil of 0.5 M 
EDTA, pH 8.0. The sample is supplemented with 1/10 volume of 3 M sodium acetate, pH 
5.0, precipitated with 2.5 volumes of ethanol in the presence of 2 p.g glycogen, centrifuged 30 
min at 16,000 x g, and the pellet was then washed twice with 70% ethanol at room 
temperature and dissolved in TE-L buffer. Library DNA is stored at -20°C- 
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EXAMPLE 5: POSITIONAL AMPLIFICATION AND SEQUENCING OF HUMAN 
TP53 GENE REGIONS FROM TRF LIBRARY PREPARED BY HYDRODYNAMIC 

SHEARING 

[0201] This example describes amplification and sequencing of specific human 
tp53 gene regions from a TRF library prepared by hydrodynamic shearing. In the primary 
step of PCR™ amplification, a specific proximal primer is used with the universal 10-mer 
poly-C (Cio) primer. The amplified DNA is diluted and used as template for nested or 
secondary PCR™ amplification with specific distal primers in conjunction with the Cio 
primer. The products of the nested amplification are then utilized as templates for cycle 
sequencing with the same primer used in nested PCR™ or with more distal sequencing 
primers. 

[0202] Amplification primers are designed using Oligo version 6.53 primer 
analysis software (Molecular Biology Insights, Inc.; Cascade, CO). Primers are 21 to 23 
bases long, having high internal stability, low 3 '-end stability, and melting temperatures of 
57-62°C (at 50 mM salt and 2 mM MgCl 2 ). Primers are designed to meet all standard 
criteria, such as low primer-dimer and hairpin formation, and are filtered against a human 
genomic database 6-mer frequency table. 

[0203] Oligonucleotides for primary PCR™ amplifications are designed to target 
amplicons of three specific regions of the human tp53 gene: primer HS3+ specific for target 
region encompassing exons 5, 6, and 7, primer HS4+ for exons 7, 8, and 9, and primer HB7- 
for exon 1 1 (FIG. 1 1 and Table I). Primary PCR™ is carried out with 240 nM specific 
primer, 100 nM of universal Cio primer, and 200 ng of human TRF library DNA (described 
in Example 4) in a final volume of 25 jj.1 under standard Titanium Tag Polymerase conditions 
(Clontech; Palo Alto, CA). After initial denaturing at 94°C for 2 min samples are subjected to 
37 cycles at 94°C for 10 sec, 68°C for 2 min and 15 sec, and a final extension at 72°C for 3 
min. Control reactions are performed under the same conditions with 200 ng of fragmented 
but not tailed human DNA as template or with the Cio primer alone. Aliquots of 15 |il of each 
PCR™ reaction are analyzed by electrophoresis on a 1% agarose gel (FIG. 12). As shown, 
specific patterns of discrete bands are amplified from fragmented, non-tailed DNA, whereas a 
uniform smear is obtained when TRF library DNA is used as the template. This smear 
reflects the random process of fragmentation and spans the region ranging from the average 
library size (i.e., 3 Kb) down to a few hundred base pairs in size. 

[0204] Attempts to sequence primary amplicons from human TRF library directly 
with either the same primers used for primary amplification or nested primers were 
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unsuccessful, which was unlike the sequencing results from bacterial TRF library amplicons 
(Example 2). In the case of the same primer utilized, the sequencing chromatograms are 
mixed, indicating the presence of more than one sequence. In the case of nested primers, the 
signal is too low, even if primer concentration was doubled or the template was increased to 
several hundred nanograms per sequencing reaction. 

[0205] FIG. 13 presents titration of the amount of library DNA used in primary 
^CR™ amplification with HS4+ and Cio primers. As shown, at the lowest amount of DNA 
used (i.e., 50 ng), there is no amplification of discrete bands in the control sample with non- 
tailed, sheared DNA, yet a smear was amplified in the G-tailed library sample. Higher 
amounts of template cause the appearance of multiple discrete bands in the controls. Thus, in 
subsequent primary amplifications the amount of template was kept at 50 ng per PCR™ 
reaction. An additional advantage of using a lower amount of DNA is the lack of discrete 
bands in the amplified smear from the G-tailed library. The presence of such bands can 
compromise the sequencing quality from secondary amplicons due to abrupt and premature 
decreases in signal intensity (FIG. 12, compare lane 6 and lane 9), especially if the bands are 
short products. 

[0206] Secondary PCR™ is performed with diluted primary amplicons as 
template, universal Cio primer, and specific primers located downstream from the primary 
amplification sites. The primers used are: HS2+ and HS4+, nested for priming site HS3+; 
HS14177+, nested for priming site HS4+; and HB8-, nested for priming site HB7- (FIG. 11 
and Table I). PCR™ amplification is carried out in duplicate 25 |xl reactions with 200 nM 
nested primer, 100 nM Cio primer, and 1 yil of 1,000 to 10,000-fold diluted primary amplicon 
as template. The PCR™ conditions included initial denaturation at 94°C for 2 min, first cycle 
94°C for 10 sec, 68°C for 2 min and 10 sec, and an incremental increase of extension time of 
2 sec per cycle for 36 more cycles. Aliquots of 10 jil of each PCR™ reaction are analyzed by 
electrophoresis on 1% agarose gels (FIG. 14). As shown in the FIG., discrete patterns of 
amplified fragments are obtained in the secondary amplification. 

[0207] The products of the secondary PCR™ amplifications are quantified from 
the stained gel against standard DNA marker bands using the volume quantitation tool of 
Fluor-S Imager software (BioRad; Hercules, CA). The nested PCR™ products are purified 
free of primers and nucleotides with the QIAquick PCR™ purification kit (Qiagen; Valencia, 
CA), eluted in 50 jil of 3 mM Tris-HCl, pH 7.5 and used as template for cycle sequencing 
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with the same primers used for nested PCR™, or with additional nested primers for walking 
sequencing. 

[0208] Cycle sequencing is carried out by mixing 2 to 11 jxl of sequencing 
template, containing 40 to 250 ng of total DNA, with 1 (il of a 5 \iM solution of each 
sequencing primer and 8 fal of DYEnamic ET terminator reagent mix (Amersham Pharmacia 
Biotech; Piscataway, NJ) in 96 well plates in final volume of 20 Amplification is 
performed for 30 cycles at: 94°C for 20 sec, 58°C for 15 sec, and 60°C for 75 sec. Samples 
are precipitated with 70% ethanol and analyzed on a MegaBACE 1000 capillary 
electrophoresis sequencing system (Amersham Pharmacia Biotech; Piscataway, NJ) using the 
manufacturer's protocol. 

[0209] Table II shows a summary of the sequencing results obtained for the three 
targeted tp53 genomic regions. The average read length of the analyzed sequences is above 
600 bases. A sequence is considered to be a failure if 100 or less bases are identifiable. Valid 
sequencing reads were constrained to a preset threshold score of >20 using the Phred 
algorithm (Codon Code Corporation; Dedham, MA), which corresponds to an error 
probability of 1%. Sequence accuracy as compared to the published human tp53 sequences 
(AF136270 and XM04321 1) is greater than or equal to 98%. 

[0210] Thus, this example demonstrates that specific genomic loci can be 
amplified and sequenced with high level of accuracy from TRF libraries from higher 
eukaryotic organisms. 

EXAMPLE 6: PREPARATION OF TRF LIBRARY FROM CORN GENOMIC DNA 

BY HYDRODYNAMIC SHEARING 

[0211] This example describes the preparation of TRF library of average size of 
about 3 Kb from com genomic DNA by hydrodynamic shearing. 

[0212] DNA from wild type 6N615 com strain is isolated from seedlings using 
Roche (Nutley, NJ) Plant DNA Isolation Kit (Cat # 1667 319) with the indicated 
modifications. Two grams of plant tissue material are frozen in liquid nitrogen and processed 
with five grinding beads by vortexing for 2 min at maximum speed. Beads are removed, and 
the pulverized plant material is lysed following the manufacturer's protocol for 10 min at 
65°C. Proteins and other impurities are precipitated on ice, the supernatant is cleared by 
filtration through a cloth filter and total nucleic acids are precipitated at -20°C for 20 min. 
The pellet is rinsed 3 times with 70% ethanol, dissolved in 300 (il buffer #4 at 65°C, and the 
supernatant is treated with 18 jal of RNase cocktail (Ambion; Austin, TX) 500 U/ml RNase 
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A, 20,000 U/ml RNase Tl) at 37°C for 25 min. Following two extractions with 
nhenol/chloroform/isoamyl alcohol (25:24:1 by volume), the aqueous phase is supplemented 
with 1/10 vol. of 3 M sodium acetate, pH 5.0 and 2.5 volumes of absolute ethanol at room 
temperature. The DNA pellet is rinsed 4 times with 70% room temperature ethanol, and 
DNA is dissolved in 300 ill of TE-L buffer (10 mM Tris-HCl, 0.1 mM EDTA, pH 7.5). The 
typical yield is 30 to 60 jag DNA per gram of tissue. 

[0213] Genomic DNA is diluted to 100 ng/jil in TE-L buffer. The sample is 
incubated at 45°C for 5 min, vortexed for 2 min at maximum speed, and centrifuged at 16,000 
x g for 10 min at room temperature. To avoid the presence of particulate matter, the 
supernatant is slowly aspirated and transferred to a clean tube sacrificing the last 50 
microliters. Aliquots of 180 p.1 of the DNA prep are subjected to mechanical fragmentation 
using the HydroShear device (Gene Machines) for 20 passes at a speed code of 9 following 
the manufacturer's protocol. The sheared DNA has an average size of 3 kb as predicted by 
manufacturer and confirmed by gel electrophoresis. To prevent DNA carry-over 
vontamination, the shearing assembly of the HydroShear is washed 3 times each with 0.2 M 
HC1, 0.2 M NaOH, and 5 times with TE-L buffer prior to and after fragmentation. All 
solutions are 0.2 jim filtered before use. 

[0214] Homopolymeric G-tails, consisting of about 10 to 15 nucleotides, are 
enzymatically added to the 3 '-termini of the DNA fragments by terminal deoxynucleotidyl 
transferase. DNA template at 20 ng/jil is incubated with 40 units of New England Biolabs 
(NEB; Beverly, MA) terminal transferase in lx NEB restriction buffer # 4 containing 0.25 
mM CoCl 2 , and 5 to 20 jiM dGTP in a final volume of 100 ^1 for 20 min at 37°C. Reaction is 
stopped by adding 4 jil of 0.5 M EDTA, pH 8.0. The sample is supplemented with 1/10 vol. 
of 3 M sodium acetate, pH 5.0, precipitated with 2.5 volumes of ethanol in the presence of 2 
lig glycogen, centrifuged 30 min at 16,000 x g, and the pellet was then washed twice with 
70% ethanol at room temperature and dissolved in TE-L buffer. Aliquots of 1 |ig of the 
library are analyzed by electrophoresis on a 1 % agarose gel. Library DNA is stored at — 
20°C. 

EXAMPLE 7: POSITIONAL AMPLIFICATION AND SEQUENCING OF FOUR 
GENOMIC REGIONS IN CORN FROM A TRF LIBRARY PREPARED BY 

HYDRODYNAMIC SHEARING 

[0215] This example describes amplification and sequencing of four specific corn 

genomic regions from a TRF library (FIG. 15). In the primary step of PCR™ amplification, 
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a proximal primer is used along with universal 10-mer poly-C (Cio) primer. The amplified 
DNA is diluted and used as template for nested or secondary PCR™ amplification with a 
distal primers and Cio primer. The products of the nested amplification are then utilized as 
templates for cycle sequencing with the same primer used in nested PCR™ or with more 
distal walking sequencing primers. 

[0216] Amplification primers are designed using Oligo version 6.53 primer 
analysis software (Molecular Biology Insights, Inc., Cascade, CO) Primers are 21-23 base 
long, having high internal stability, low 3'-end stability, and melting temperatures of 57-62°C 
(at 50 mM salt and 2 mM MgCh). Primers are designed to meet all standard criteria such as 
low primer-dimer and hairpin formation and are filtered against a com genomic database 6- 
mer frequency table. 

[02171 Primary PCR™ is carried out with 200 nM specific primer, 100 nM of 
universal Cio primer, and 80 ng of com TRF library DNA (described in Example 6) in a final 
volume of 25 under standard Titanium Tag Polymerase conditions (Clontech). After 
initial denaturing at 94°C for 2 min, samples are subjected to 37 cycles at 94°C for 10 sec, 
68°C for 2 min and 15 sec, and a final extension at 72°C for 3 min. In some cases (genomic 
regions 3 and 4; see below) primary PCR™ amplification is done by initial denaturing at 
94°C for 2 min, first cycle 94°C for 10 sec, 68°C for 2 min and 10 sec, and incremental 
increase of extension time of 2 sec per cycle for 36 more cycles. Control reactions are 
performed under the same conditions with 80 ng of fragmented but not tailed human DNA as 
template. Aliquots of 12 jxl of each PCR™ reaction are analyzed by electrophoresis on 1% 
agarose gels. 

[0218] Secondary (nested) PCR™ is carried out with diluted primary amplicons 
as template, universal Cio primer, and specific primers downstream from the primary 
amplification sites. PCR™ amplification is in duplicate 25 jal reactions with 200 nM nested 
primer, 150 nM C-10 primer, 1 |il of 1,000 x diluted primary amplicon as template by initial 
denaturing at 94°C for 2 min, first cycle 94°C for 10 sec, 68°C for 2 min and 10 sec, and 
incremental increase of extension time of 2 sec per cycle for 36 more cycles. Aliquots of 10 
III of each PCR™ reaction are analyzed by electrophoresis on 1% agarose gels. 

[0219J The products of the secondary PCR™ amplifications are quantified against 
standard DNA marker bands using the volume quantitation tool of Fluor-S Imager software 
vBio Rad). The nested PCR™ products are purified free of primers and nucleotides using the 
QIAquick PCR™ purification kit (Qiagen), eluted in 50 |al of 3 mM Tris-HCl, pH 7.5 and 

52 



03050242A2 I > 



WO 03/050242 



PCT/US02/37322 



used as template for cycle sequencing with the same primes used for nested PCR™ or with 
additional nested primers for walking sequencing. Cycle sequencing is carried by mixing 2 
to 11 nl of sequencing template containing 40 to 250 ng of total DNA with 1 fxl of each 
sequencing primer at 5 nM, and 8 |il of DYEnamic ET terminator reagent mix (Amersham 
Pharmacia Biotech; Piscataway, NJ) in 96 well plates in a final volume of 20 jtl. 
Amplification is for 30 cycles at: 94°C for 20 sec, 58°C for 15 sec, and 60°C for 75 sec. 
Samples are precipitated with 70% ethanol and analyzed on a MegaBACE 1000 capillary 
electrophoresis sequencing system (Amersham Pharmacia Biotech; Piscataway, NJ) using the 
manufacturer's protocol. A sequence is considered to be a failure if 100 or less bases are 
identifiable. Valid sequencing reads were constrained to a preset threshold score of >20 
using the Phred algorithm (Codon Code Corporation, Dedham, MA), which corresponds to 
an error probability of 1 %. 

[0220] The following genomic regions are analyzed (see FIG. 15): 

[0221] Region I. asg60.slb. The sequence is a 456 bp STS mapped to 
chromosome 5 published in Cold Spring Harbor Maize Genome Analysis Database (which 
can be found on their website). The unknown downstream flanking region is amplified and 
sequenced using primer asg60.sl 133+ for primary amplification and primer asg60.sl 405+ 
for both nested amplification and sequencing (Table I). The average read length from three 
individual sequencing runs is 562 bases (range 547-581) at a Phred score of >20. A 
consensus sequence of 696 bp is assembled from the three sequencing chromatogram files. 

[0222] Region 2. Maysine enhancer. A genomic region of 1,376 bp 
corresponding to the com transcriptional regulator gene (Accession # AF1 36530), which is a 
homologue to the silk Maysine enhancer, mapped as a single copy gene to the sh2-al region 
on chromosome 3 (United States Department of Agriculture/Agricultural Research Service 
and University of Missouri Maize Genomic Center Database). The unknown upstream 
flanking region is amplified with primers Zea X 21 1- and Zea X 254- in primary PCR™ from 
the com TRF library and re-amplified with primers Zea X 21 1-, Zea X 149-, and Zea X 49- in 
nested PCR™ (Table I, FIG. 16). Each of the nested PCR™ primers is also used as 
sequencing primer in three individual cycle sequencing reactions. The average read length 
from six quality sequencing runs is 583 bases (range 421-703) at a Phred score of >20. 
Consensus sequence of 782 bp is assembled from the sequencing chromatogram files. 

[0223] Region 3. MubGl Upstream Region. A unique 500 bp sequence from 
the published MubGl (Poly-Ubiquitin gene 1) promoter is used to design primers. The 
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unknown flanking region upstream of the promoter is amplified with primers MubGl 2 18-, 
MubGl 3 17-, MubGl 356-, in primary PCR™ from com TRF library and with primers 
MubGl 24-, MubGl 218-, MubGl 317-, in nested PCR™ (Table I, FIG. 17). Primers 
MubGl 218- and MubGl 24- are used with the three amplified templates in three individual 
cycle sequencing reactions. The average read length from a total of nine runs is 578 bases 
(range 444-652) at a Phred score of >20. Consensus sequence of 867 bp is assembled from 
the raw data sequencing chromatogram files. 

[0224] Region 4. MubGl Downstream Region. A unique 500 bp sequence 
from genomic MubGl contig located at the 3'-end of the poly-Ubiquitin gene is used to 
design primers. The unknown flanking downstream region is amplified with primers MubGl 
393+ and MubGl 395+ in primary PCR™ from com TRF library and re-amplified with 
primers MubG1428+and MubGl 430+ in nested PCR™ (Table I, FIG. 17). Primers MubGl 
428+ and MubGl 430+ are used in sequencing with the two sequencing templates derived 
from nested PCR™ and in 3 individual cycle sequencing reactions. The first primer failed to 
produce good quality sequencing ladders. The average read length from the three quality 
sequencing runs with primer MubGl 430+ is 624 bases (range 616-639) at a Phred score of 
>20. Consensus sequence of 626 bp is assembled from the sequencing chromatogram files. 

[0225] Thus, in this example four out of four attempted genomic regions were 
successfully sequenced. The average read length at a Phred score of >20 is 581 bases. The 
total high quality sequence generated is 2,971 bases of which 1,350 bases are sequenced de 
npvo and do not match any reference sequences. Out of 1,621 bases of new sequences 
overlapping reference regions, the total number of mismatches is six. One out of eight 
sequencing primers did not produce a sequencing ladder of acceptable quality. 

EXAMPLE 8: PREPARATION OF TRF LIBRARY FROM E. COLI GENOMIC DNA 
BY THERMAL FRAGMENTATION METHOD 

[0226] This example describes the preparation of the TRF library of average size 

of 1 Kb from E. coli genomic DNA, particularly by DNA hydrolysis at high temperature 

under neutral conditions and terminal transferase mediated tailing with deoxyguanosine 

triphosphate. 

[02271 Th e prepared library allows reproducible amplification of many nested 
DNA mixtures using one sequence-specific primer and universal homopolymeric primer Cio 
(containing ten cytosines). Sequencing of these mixtures using the same primer generates 
600 - 800 base reads that are adjacent to chosen kernel primers. 
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[0228] DNA is isolated by standard purification from, for example, E. coli strain 
MG1655 and diluted to 200 ng/^1 in TE-L buffer (10 mM Tris-HCl, pH 7.5; 0.1 mM EDTA). 
To thermally fragment the DNA, the sample is incubated at 95°C for 5 min in Mini Cycler 
machine (MJ Research) using the heating lid. For comparison, mechanically broken DNA 
sample is prepared as described in Examples 1, 4 and 6, except that the fragmentation on a 
HydroShear device (Gene Machines) is achieved by 20 passes at a speed code of 3. The 
average size of fragmented DNA is then analyzed by electrophoresis on a 1% agarose gel 
under alkaline conditions. FIG. 18 shows the DNA size distributions after thermal 
fragmentation and hydrodynamic shearing. 

[0229] Homopolymeric G tails, consisting of 10 to 15 nucleotides, are 
enzymatically added to the 3'-termini of the DNA fragments by terminal deoxynucleotidyl 
transferase. DNA template at 10 ng/^il is incubated with 20 units of New England Biolabs 
(NEB) terminal transferase in lx NEB restriction buffer # 4 containing 0.25 mM C0CI2, and 
20 jxM dGTP in a final volume of 100 \il for 15 min at 37 °C. The reaction is stopped by 
adding 10 jil of 0.5 M EDTA, pH 8.0. The sample is supplemented with 1/10 volume of 3 
M sodium acetate, pH 5.0, precipitated with 2.5 volumes of ethanol in the presence of 2 jig 
glycogen, washed twice with 70% ethanol at room temperature, and dissolved in TE-L buffer. 

EXAMPLE 9: AMPLIFICATION AND SEQUENCING OF E. COLI DNA REGIONS 

WITH SPECIFIC PRIMERS FROM TRF LIBRARY PREPARED BY THERMAL 
FRAGMENTATION METHOD VS. LIBRARY PREPARED BY HYDRO-SHEARING 

METHOD 

[0230] Primers for amplification are designed using Oligo version 6.53 primer 
analysis software (Molecular Biology Insights, Inc., Cascade, CO). Primers are 21 to 23 bases 
long, having high internal stability, low 3 -end stability, and melting temperatures of 57°C to 
62°C (at 50 mM salt and 2 mM MgCl 2 ). Primers are designed to meet all standard criteria 
such as low primer-dimer and hairpin formation and are filtered against an E. coli genomic 6- 
jier frequency database. 

[0231] Oligonucleotides for PCR™ amplifications are designed to target 
amplicons of two specific regions of the E. coli DNA: primers S3, S6 (Table I). 

[0232] TD PCR™ amplification is performed with 300 nM specific primer, 300 
nM of universal C10 primer, and 40 ng of E. coli TRF library DNA (described in Example 8) 
in a final volume of 25 \x\ under standard Titanium Taq Polymerase conditions (Clontech; 
Palo Alto, CA). After initial denaturing at 95°C for 2 min, samples are subjected to 20 cycles 
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at 95°C for 15 sec, 73°C for 2 min and 15 sec, with decreasing temperature of 0.5°C in each 
cycle. The next round of amplification is 25 cycles at 95°C for 1 5 sec, 60°C for 2 min, with 
increasing time of extension of 1 sec each cycle. Aliquots of 12 jal of each PCR™ reaction 
are analyzed by electrophoresis on a 1% agarose gel (FIG. 19). As shown, a uniform smear is 
obtained when TRF library prepared by hydrodynamic shearing is used as the template, 
whereas a smear with some faint discrete bands is amplified from TRF library prepared by 
thermal fragmentation. 

[0233] The PCR™ amplification products are quantified from the stained gel by 
comparison with standard DNA markers using the volume quantitation tool of Fluor-S 
Imager software (Bio Rad). The PCR™ products are purified free of primers and nucleotides 
by the QIAquick PCR™ purification kit (Qiagen), eluted in 30 of 1 mM Tris-HCl, pH 7.5 
and used as template for cycle sequencing with the same primers used for PCR™. 

[0234] Cycle sequencing is performed by mixing 2 to 11 jal of sequencing 
template, containing 40 to 250 ng of total DNA, with 1 ml of 5 |xM each sequencing primer 
and 8 nl of DYEnamic ET terminator reagent mix (Amersham Pharmacia Biotech; 
Piscataway, NJ) in 96 well plates in final volume of 20 Amplification is performed for 30 
cycles at: 94°C for 20 sec, 58°C for 15 sec, and 60°C for 75 sec. Samples are precipitated 
with 70% ethanol and analyzed on MegaBACE 1000 capillary electrophoresis sequencing 
system (Amersham Pharmacia Biotech; Piscataway, NJ) using the manufacturer's protocol. 

[0235] Table in shows a comparison of the sequencing results obtained from the 
two regions of the E. coli genome from TRF libraries prepared by thermal fragmentation and 
hydrodynamic shearing methods. For both libraries, the average read length of the analyzed 
sequences is above 600 bases. Sequence accuracy as compared to the published E. coli K12 
MG1655 sequences is equal or greater than 98%. 

Table III. Comparison of the Sequencing Results for two Regions of the E. coli Genome 
Amplified From Thermally Fragmented and Hydro Sheared TRF Libraries 

Sequenced Read Length at Accuracy of the Read 

Region Phred >20 (% match with published sequence) 



TRF-TF Library 



S3 Region 



671 



98% 



S6 Region 



734 



98% 
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TRF-HS Library 

S3 Region 700 99% 

S6 Region 700 99% 

[0236] This example demonstrates that specific genomic regions can be amplified 
and sequenced with a high level of accuracy and long read length from a TRF library 
prepared by thermal fragmentation from bacterial DNA. 

EXAMPLE 10: HIGH THROUGHPUT PREPARATION, AMPLIFICATION AND 
SEQUENCING OF MULTIPLE TRF DNA LIBRARIES CREATED BY THERMAL 

FRAGMENTATION METHOD 

[0237] This example describes parallel preparation of multiple TRF libraries from 
different DNA sources. The proposed protocol is based on the reasonable assumption that 
preparation of the TRF libraries by thermal fragmentation procedure and terminal transferase 
mediated G-tailing reaction can be easily scaled up to the 96 or 384 multi-well format. 

[0238] FIG. 20 shows schematically all steps involved in preparation of the TRF 
library in the multi-well format. The drawing shows only 36-well plate, but it can be 96, 384, 
1536 or larger format. 

[02391 Important steps involved in the protocol include, for example: 1) 
preparation of DNA in low salt TE buffer; 2) incubation of DNA at high temperature (for 
example, 95°C) for a specific time (for example, 5 min); enzymatic addition of the 
homopolymeric G-tails to the 3' ends of DNA fragments by terminal transferase; 3) DNA 
purification by ethanol precipitation or spin-column; 4) PCR™ (nested PCR™) amplification 
using sequence-specific primer(s) S (Sn) and universal homopolymeric primer Cio; 5) primers 
and nucleotides removal; 6) cycle sequencing using sequence-specific primer S or Sn; 7) 
DNA purification by ethanol precipitation or spin-column; and/or 8) analysis of the DNA 
samples by the 96-capillary DNA sequencing device. 

EXAMPLE 11: THERMAL FRAGMENTATION OF DNA UNDER DIFFERENT 

BUFFER AND SALT CONDITIONS 

[0240] This example illustrates the efficiency of DNA thermal fragmentation at 
low salt conditions and demonstrates the inhibitory effect of monovalent and divalent cations 
on the DNA degradation during incubation at high temperature. 

[0241] DNA was isolated by standard purification from E. coli strain MG1655, 
ethanol precipitated, washed with 70% ethanol and dissolved in TE buffer at a concentration 
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of 100 |ig/ml. One |xg DNA aliquots were ethanol-precipitated in the presence of 2 jxg 
glycogen (Roche), centrifuged for 30 min at 16,000 x g, washed twice with 70% ethanol at 
room temperature and then dissolved in 10 |il of the following solutions: ultra pure distilled 
water ("GIBCO"); TE buffer (lOmM Tris-HCL, 1 mM EDTA, pH 7.5); TE buffer diluted 20 
times (500jiM Tris-HCL, 50^iM EDTA, pH 7.5); TE buffer supplemented with lOmM 
MgCl 2 ; 1 mM EDTA alone, pH 8.0; 100 mM EDTA alone, pH 8.0; 10 mM Tris-HCl alone, 
pH 7.5; 1 M Tris-HCl, pH 7.5; 1 x NEBuffer 4 (New England Biolabs; Beverly, MA) 
containing 50 mM potassium acetate, 20 mM Tris-acetate, lOmM magnesium acetate, 1 mM 
dithiothreitol, pH 7.9; or 1 x NEBuffer 4 supplemented with 250 pM CoCl 2 ; 1 x PCR buffer 
(Clontech) containing 40mM Tricine KOH, 16 mM KC1 3.5mM MgCl 2 3ng/^l BSA, pH 8.0. 
DNA samples were subjected to thermo-fragmentation in a MJ Research PTC-150 
MiniCycler with heating lid. Samples were incubated at 95 °C for the indicated times and then 
analyzed by alkaline agarose gel. Electrophoresis was performed in 1% agarose (Maniatis et 
aL 9 1989) with 40mM NaOH and ImM EDTA as a buffer. The gel was run at lV/cm (240- 
280 mA) for 16 hr at room temperature with buffer circulation. After electrophoresis, the gel 
was neutralized, stained with SYBR Gold (Molecular Probes), and analyzed using Bio-Rad 
Fluor S Imager. 

[0242] FIG.21A shows the kinetics of thermal fragmentation of DNA in two low 
salt buffers and water. The data show that high molecular weight DNA (FIG. 21 A, lane 2) 
can be converted into 1-2 kb fragments within minutes of exposure at 95°C. Longer times (up 
to 30 min) of heat treatment (FIG. 21 A, lanes 8, 14, and 21) leads to reduction of the average 
iize of DNA down to 100 bases. The rate of thermal fragmentation in water (FIG. 21 A, lanes 
3-8) and diluted TE buffer (FIG. 21A, lanes 9-14) is higher than in TE buffer (FIG. 21A, 
lanes 16-21). 

[0243] The inhibitory effect of different salts and buffers on thermal 
fragmentation of DNA is shown on FIG. 1 I B for the constant time of incubation (30 min). 
Incubation of DNA at 95°C in 1 M Tris-HCl (FIG. 2 IB, lane 7), 100 mM EDTA (FIG. 2 IB, 
lane 8), PCR buffer (FIG. 21B, lane 10) and NEBuffer 4 (FIG. 21B, lane 12) results in a mild 
change of the original size of DNA (FIG. 2 IB, lane 3). In contrast, incubation of DNA at 
95°C in low salt buffers such as TE (FIG. 21B, lane 2), H 2 0 (FIG. 21B, lane 4), 10 mM Tris- 
HCl (FIG. 21B, lane 5) and ImM EDTA (FIG. 21B, lane 6) produces DNA fragments 
smaller than 1,000 bases. Addition of 10 mM MgCl 2 to TE buffer (FIG. 2 IB, lane 9) also 
causes a strong inhibition of DNA thermal degradation (compare with FIG. 2 IB, lane 2). 
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Addition of Co** ions to NEBufFer 4 has no effect on the rate of thermo-fragmentation (FIG. 
21B,lahe 11). 

[0244] Thus, this example demonstrates that DNA can be fragmented very 
efficiently at neutral pH by thermal treatment at 95°C. The size of fragmented DNA can be 
controlled by time and buffer / salt concentration. The presence of Mg 2+ ions also prevents 
degradation of DNA. 

EXAMPLE 12: MECHANISM OF HEAT-INDUCED DNA 
FRAGMENTATION AT NEUTRAL pH 

[0245] This example shows that thermal fragmentation occurs predominantly at 
purine bases, suggesting a two-step mechanism that is initiated by heat-induced hydrolysis of 
glycosyl bond with the release of purine bases and followed by a heat-induced breakage of 
DNA molecule at the apurinic sites. 

[0246] Two pyrimidine-rich oligonucleotides, 29 residues long, with a fluorescein 
group at the 5' end, amino-modifier group at the V end, and only one purine base in the 
middle, were synthesized: oligonucleotides OL1 (SEQ ID NO:29) and OL2 (SEQ ID NO:30) 
with dG and dA bases in position 19, respectively (Table IV). 

Table IV. Oligonucleotides used for experiments described in Examples 12 and 14 - 18. 
Oligonucleotide ID a Sequence (S^*) 

1 . OL1 5' 6-FAM™ b - TCT CCT TCC TCC TTT CTC GCT TCT CTC CT - 



3'AmMod C7 


C 




2. OL2 


5' 6-FAM™ 


- TCT CCT TCC TCC TTT CTC ACT TCT CTC CT 


3'AmMod C7 






3.0L3 


5' 6-FAM™ 


- TCT CCT TCC TCC TTT CTC GCT TCT CTC CT 


4.0L4 


5' 6-FAM™ 


- TCT CCT TCC TC - 3'AmMod C7 


5.QL5 


5' 6-FAM™ 


- TCT CCT TCC TC 


6.OL6 


5' 6-FAM™ 


- TCT CCT TCC T 


7.0L7 


5' 6-FAM™ 


- TCT CCT TCC TC - 3'ddC d 



a) All oligonucleotides are synthesized and purified commercially 

b) 5' 6-FAM™ - 6-carboxyfluorescein 
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c) 3'AmMod C7 - 3'- amino-modifier; it eliminates the native 3'-OH group from the 
oligonucleotide, which functionally blocks this oligo from participating as a primer in 
DNA synthesis 

3'ddC - dideoxy-C is a V chain terminator that prevents 3' extension by polymerases 

[0247] Ten pmol of these oligonucleotides were diluted in 10 \i\ of water 
(GIBCO) and then subjected to thermo-fragmentation in a MJ Research MiniCycler with 
heating lid. Samples were incubated at 95°C over a time course and then analyzed on 15% 
denaturing polyacrylamide TBE-Urea gels (Invitrogen / Novex) (FIG. 22). The gels were run 
at 180 V for 45 min at constant temperature (55 °C) in a Red Roller hybridization oven 
(Hoefer). After electrophoresis, the gels were analyzed using Bio-Rad Fluor S Imager with 
Fluorescein filter and Quantity One software. 

[0248] FIG. 22A shows the kinetics of thermal fragmentation of the 
oligonucleotide OL1 with G base. After 20 minutes of incubation at 95°C, two distinct bands 
can be seen on the gel, and they reach equal intensity at 40 min of incubation. The upper band 
is unbroken fluorescein-labeled oligonucleotide, and the lower band corresponds to 
fluorescein-labeled 19-mer created as a result of cleavage at the dG site. After one hour of 
thermal treatment at 95°C, more than 50% of oligo is converted into the 19 base product, and 
smaller fragments appeared, indicating that chain breakage occurs not only at the dG site, but 
at dC and dT bases, although with much lower rate. After 110 min of exposure at 95°C, 
almost all original molecules are hydrolyzed and converted into 19 base and shorter products. 

[0249] The kinetics of thermal fragmentation of the oligonucleotide OL2 with the 
purine base A is shown on FIG. 22B. It proceeds in a similar way as for oligonucleotide OL1 
but with a somewhat slower rate. In this case the first product of thermo-hydrolysis appears 
only after 30 min of incubation at 95°C, and the bands become equal in intensity after 50 min. 

[0250] Previous studies described several types of lesions introduced into DNA 
by heat: DNA strand breaks, apurination, guanine oxidation and deamination of cytosine. The 
data provided herein clearly show that heat-induced strand breaks at neutral pH occur 
predominantly at purinic bases, and they are most likely the result of heat-induced 
apurinization in DNA. 

EXAMPLE 13: TdT TAILING OF DNA AFTER THERMAL 
FRAGMENTATION 
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[0251] This example demonstrates the availability of DNA termini, particularly 3' 
ends generated after thermal fragmentation, to enzymatic tailing by terminal transferase. 

[0252] DNA was isolated by standard purification from fresh human 
lymphocytes, ethanol precipitated and dissolved in TE buffer at concentration 100 ng/jil. 

[0253] Five ^ig DNA aliquots were subjected to thermo-fragmentation in a MJ 
Research MiniCycler with heating lid. Samples were incubated at 95 °C for 5 minutes 
followed by additional heat treatment at the same temperature for 10 minutes in NEBuffer 4 
containing lOmM magnesium acetate. This step was introduced with the anticipation that 
second heating in the presence of Mg 2+ ions would stimulate chain breaks at apurinic sites left 
after the first heating step (at low salt) without noticeable creation (and breakage) of any new 
abasic sites (Lindahl and Andersson, 1972). This was confirmed by experiments on a model 
oligonucleotide system. The reaction products were electrophoresed through a 1% agarose 
alkaline gel, stained with SYBR Gold, and the bands representing the size around 1 kb were 
excised from the gel. The molecules were extracted from the gel by using a DNA extraction 
kit (Ultrafree-DA (Millipore)) and then ethanol precipitated. Next, the homopolymeric dG 
tail, dA tail, and mixed dG and dA tail were enzymatically added to the 3 '-termini of the 
DNA fragments by terminal deoxynucleotidyl transferase. DNA templates at 100 ng/jil were 
incubated with 10 units of terminal transferase (NEB) in lx NEBuffer 4 containing 0.25 mM 
CoCl 2 and 100 dGTP or 100 jiM dATP or a mixture of 100 nM dGTP and 100 ^iM dATP 
in a final volume of 20 |il for 20 min at 37°C. The reaction was stopped by adding 2 \il of 0.5 
M EDTA, pH 8.0. Samples were ethanol-precipitated and then analyzed on 6 % denaturing 
polyacrylamide TBE-Urea gels (Invitrogen / Novex). The gels were run at 180 V for 45 min 
at the constant temperature 55°C in a Red Roller hybridization oven (Hoefer). After 
electrophoresis gels were stained with SYBR GOLD and analyzed using Bio-Rad Fluor S 
Imager. 

[0254] Results of the tailing of DNA fragments produced by thermo- 
fragmentation are presented on FIG. 23. Lanes 1 and 4 show the original 1 kb DNA size 
fraction after thermo-fragmentation. Lanes 2, 3 and 5 show the same DNA after incubation 
with terminal transferase and dGTP, dATP and dGTP/dATP mix, respectively. About 30% of 
heat-induced 3' DNA ends are tailed with dGTP/dATP mix (FIG. 23, lane 5). No tailing can 
be seen for dGTP and dATP nucleotides. 

EXAMPLE 14: HOMOPOLYMER TAILING REACTION CATALYZED BY 
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TERMINAL TRANSFERASE ON THERMALLY FRAGMENTED 
OLIGONUCLEOTIDE TEMPLATE 

[0255] This example characterizes TdT-mediated tailing efficiency of 
oligonucleotide termini produced by thermo-fragmentation process and describes a novel 3' 
end repair function of the terminal transferase enzyme. 

[0256] Three pyrimidine-rich oligonucleotides, 29 residues long, with a 
fluorescein group at the 5' end were used. Oligonucleotides OL1 and OL2 were synthesized 
with blocking group Amino Modifier C7 at the 3* end and one purine base (dG or dA, 
respectively) in the middle (Table IV; see also Example 12). Oligonucleotide OL3 (SEQ ID 
NO:31) is similar to oligonucleotide OL1 but has a 3'-OH group. Ten pmol of the 
oligonucleotide OL1, OL2 or OL3 was diluted in 10 \xl of water (GIBCO) and then subjected 
to thermo-fragmentation at 95°C for 50 minutes in a MJ Research MiniCycler with heating 
lid. Products of thermo-fragmentation and non-heated oligonucleotides OL1, OL2 or OL3 
were tailed by terminal deoxynucleotidyl transferase (TdT). Ten pmol of these 
oligonucleotides were incubated with 10 units of terminal transferase (NEB) in lx NEBuffer 
4 containing 0.25 mM CoCl 2 and 100|aM dGTP (FIG. 24 A and 24B) or dATP (FIG. 24C) in a 
final volume of 50 |al for 20 min at 37°C. The reaction was stopped by adding 5 ^1 of 0.5 M 
EDTA, pH 8.0. Samples were ethanol-precipitated and then analyzed on a denaturing 15% 
polyacrylamide TBE-Urea gel (Invitrogen / Novex) (FIG. 24). The gels were run at 180 V for 
45 min at the constant temperature 55 °C in the Red Roller hybridization oven (Hoefer). After 
electrophoresis, gels were analyzed using Bio-Rad Fluor S Imager with Fluorescein filter and 
Quantity One software. 

[0257] Surprisingly, despite the presence of a 3' AmMod C7 group, which 
functionally should block oligonucleotides from participating as a primer in DNA synthesis, 
both oligonucleotides OL1 and OL2 are tailed efficiently with dGTP, and almost 100% of 
molecules receive G-tails and change their mobility (FIG. 24A and 24B). The 19-rner 
products of thermo-fragmentation are also tailed but not completely. About 50% of these 
products are competent for G-tailing and change their mobility (FIG. 24A and 24B). At the 
same time, the 19-mer product of thermo-fragmentation of oligonucleotide OL3 shows no 
tailing in the presence of dATP. 

[0258] It is known that fragmentation via depurinization produces DNA 
fragments with enzymatically non-cbmpetent 3* ends (Kotaka and Baldwin, 1964; Lindahl 
and Andersson, 1972). The data presented in this Example demonstrate a new function of 
terminal transferase, specifically, the ability to process ends lacking 3' hydroxyl group. It is 
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shown that in the presence of dGTP, TdT is able to tail a significant fraction (50%) of ends 
resulted after break at the apurinic site and almost all ends terminated with Amino Modifier 
C7, suggesting a novel 3* end repair function of the terminal transferase enzyme. The absence 
of tailing in the presence of dATP suggests a special role for deoxyguanine triphosphate in 
the repair process catalyzed by TdT. 

EXAMPLE 15: TdT-MEDI ATED TAILING OF BLOCKED AND NORMAL 
OLIGONUCLEOTIDE TEMPLATES: EFFECT OF dGTP CONCENTRATION 

[0259] This example compares tailing reactions catalyzed by terminal transferase 
in the presence of different concentrations of dGTP on 3' blocked and non-blocked model 
oligonucleotide templates. The titration of dGTP concentration was necessary to define the 
working concentration for oligonucleotide template. 

[0260] Two pyrimidine-rich oligonucleotides OL1 and OL3, each 29 residues 
long with a fluorescein group at the 5* end, were used. Oligonucleotide OL1 has the blocking 
Amino Modifier C7 group and oligonucleotide OL3 the hydroxyl group at the 3 'end (Table 
IV). Ten pmol of these oligonucleotides were subjected to a tailing reaction in the presence of 
different dGTP concentrations. Blocked and unblocked oligonucleotides were incubated with 
10 units of terminal transferase (NEB) at 37°C (20 min) in lx NEBuffer 4 containing 0.25 
mM C0CI2 and the concentration of dGTP varying from 10 \iM to 100 \iM in a final volume 
of 50 One-fifth of the volume of the reaction mixture was analyzed on the 1 5% denaturing 
polyacrylamide TBE-Urea gels (Invitrogen / Novex) (FIG. 25). The gels were run at 180 V at 
a constant temperature of 55°C in the Red Roller hybridization oven (Hoefer). After 
electrophoresis, gels were analyzed using Bio-Rad Fluor S Imager with Fluorescein filter and 
Quantity One software. 

[0261] The experiment shows that complete tailing of the oligonucleotide with 3' 
OH group occurs at 10 ^iM dGTP (FIG. 25B). At a similar concentration of dGTP, the 
oligonucleotide with 3' blocking group shows no detectable tailing (FIG. 25 A). For blocked 
¥ ends, tailing becomes visible at 20 jxM dGTP and reaches its maximum (more than 90%) 
at 100 dGTP (FIG. 25A). 

[0262] These data provide additional evidence that dGTP is required for repair 
activity of terminal transferase and show that only high concentration (50 ^iM and above) of 
this nucleotide activates TdT-mediated repair of blocked 3' ends. 
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[0263] The results of Example 5 are important for defining conditions for the G- 
tailing of DNA fragments produced by different physical and chemical methods that usually 
have "bad" 3' ends. In particular, it provides (in combination with Example 14 and Example 
16) reasonable explanation why thermo-fragmented DNA can be efficiently tailed with 
dGTP/dATP mix but not with dATP in the Example 13. 

EXAMPLE 16: SPECIAL ROLE OF dGTP NUCLEOTIDE IN TAILING 
REACTION CATALYZED BY TERMINAL TRANSFERASE ON 3' END BLOCKED 
TEMPLATES 

[0264] This example demonstrates a unique role of the nucleotide dGTP in its 
ability to process the 3' end of an oligonucleotide with 3' Amino C7 blocking. 

[0265] In this example, four oligonucleotides were used: oligonucleotides OL1 
and OL4 (SEQ ID NO: 32) with a fluorescein group at the 5' end and with a blocking group 
Amino Modifier C7 at the 3' end; oligonucleotide OL5 (SEQ ID NO:33) with a fluorescein 
group at the 5* end and with an OH group at the 3' end; and oligonucleotide OL7 with a 
fluorescein group at the 5* end and with a dideoxy C (ddC) blocking group at the 3' end 
(Table IV). Tailing reactions were performed using 10 pmol of an oligonucleotide and 10 
units of terminal transferase (NEB) in lx NEBuffer 4 containing 0.25 mM CoCl 2 , 50 ^iM of 
JXTP (where X is G, A, T or C) in a final volume of 50 ^il for 20 min at 37°C. The reaction 
was stopped by adding 5 ^1 of 0.5 M EDTA, pH 8.0. Samples were ethanol-precipitated and 
then separated on a denaturing 15% polyacrylamide TBE-Urea gel. After electrophoresis, the 
gel was analyzed using Bio-Rad Fluor S Imager with Fluorescein filter and Quantity One 
software. 

[0266] FIG. 26A shows TdT-mediated repair/tailing of the AmMod C7 blocked 
oligonucleotide OL1 with different nucleotide-triphospates. The effect of tailing is only 
observed with dGTP (FIG. 26, lane 1 vs. lane 5), while other nucleotides have no effect (FIG. 
26, lane 1 vs. lanes 2, 3, 4). 

[0267] FIG. 26B shows TdT-mediated repair/tailing of another Amino C7 blocked 
oligonucleotide OL4 in the presence of dGTP (FIG. 26B, lane 1 and lane 4). Interestingly, 
terminal transferase is unable to repair and tail the oligonucleotide OL7 (SEQ ID NO:35) 
with dideoxy C (ddC) blocking group at the 3' end (FIG. 26B, lane 3 and lane 6). Control 
.erminal transferase tailing of non-blocked oligonucleotide OL5 with dGTP is shown on the 
FIG. 26B (lanes 2 and 5). 
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[0268] Obviously, dGTP plays a dual role in the tailing mechanism catalyzed by 
terminal transferase on the 3' end blocked DNA substrates. First, it serves as a cofactor that 
induces end repair process that eliminates terminal blocked nucleotide (s), and, second, it 
serves as a substrate for tailing reaction. dGTP-induced repair activity of terminal transferase 
is a novel property that has not previously been described. 

EXAMPLE 17: MECHANISM OF THE 3' END REPAIR ACTIVITY OF 

TERMINAL TRANSFERASE 

[0269J This example shows that terminal transferase elongates 3' end blocked 
templates by removing one or two nucleotides from the 3' end and then adding 
homopolymeric G-tail. Because dGTP tailing at nucleotide concentration 50-100 jxM 
(concentration necessary for TdT repair activity; see Example 15) creates homopolymeric dG 
tails 25-35 residues long, riboGTP is utilized in these experiments. Ribo NTPs can be 
incorporated into DNA ends by terminal transferase as efficiently as their deoxy analogues 
with the only difference that the number of incorporated ribo-bases is limited to 1-5 
nucleotides (Boule et al y 2001). The experiment described below confirms the underlying 
assumption that ribo GTP can play the same repair activation role as dGTP does. 

[0270] Three 5' fluorescein-labeled oligonucleotides were tailed using TdT and 
ribo GTP: oligonucleotides OL4, 11 residues long, with Amino C7 blocking group at the 3' 
end; oligonucleotide OL5, 1 1 residues long, with a similar sequence but no blocking group at 
the 3' end; and oligonucleotide OL6 (SEQ ID NO:34), 10 residues long (Table IV). In one 
reaction set, 5 pmol of oligonucleotides OL4, OL5, and OL6 were incubated with 10 units of 
terminal transferase (NEB) in Ix NEBuffer 4 containing 0.25 mM CoCI 2 and 100|iM ribo 
GTP. In another reaction set, 5 pmol of the oligonucleotides OL5 was incubated with 10 units 
of terminal transferase (NEB) in Ix NEBuffer 4 containing 0.25 mM C0CI2 and four different 
concentrations of ribo GTP (1, 5, 20 \xM) in a final volume of 20 \x\ for 20 min at 37°C. The 
reaction was stopped by adding 2 \i\ of 0.5 M EDTA, pH 8.0. Samples were ethanol 
precipitated and then separated on denaturing 15% polyacrylamide TBE-Urea gel. After 
electrophoresis, the gel was analyzed using Bio-Rad Fluor S Imager with Fluorescein filter 
and Quantity One software. 

[0271] FIG. 27 shows that terminal transferase indeed repairs and adds ribo GTP 
nucleotides to the 3' end of Amino G7 blocked oligonucleotide OL4. Lane I and 2 show the 
oligonucleotide OL4 before and after ribo G-tailing, respectively. To determine the number 
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of nucleotides removed by TdT before adding a G-tail we made the comparison of lengths of 
the ribo G-tailing products of blocked oligonucleotide OL4 (FIG. 27, lane 2) with lengths of 
the ribo G-tailing products of control oligonucleotide OL5 (1 1-rner) (FIG. 27, lanes 4,8.9 and 
10) and oligonucleotide OL6 (10-mer) (FIG. 27, lane 6). Lane 7 represents the equimolar 
mixture of tailed oligo samples loaded on lanes 8, 9 and 10. Because ribo G-tailed products of 
the oligonucleotide OL4 migrate on the gel faster than corresponding products of the 10-mer 
oligonucleotide OL6 (compare lane 2 and lane 6) it is concluded that about 1 to 3 bases are 
removed by 3' exonuclease activity of terminal transferase from the end of the 
oligonucleotide OL4 before adding the tail. 

EXAMPLE 18: LENGTH-CONTROLLED TAILING BY TERMINAL 
TRANSFERASE USING riboGTP / dGTP MIXTURES 

[02721 This example demonstrates that terminal transferase can be used for 
addition of 2-10 guanine bases to the 3' ends of oligonucleotides, suggesting a controlled 
TdT-mediated repair/tailing procedure for preparing TRF library. 

[0273] Oligonucleotide OL5, 11 residues long, with a fluorescein group at the 5' 
end (Table IV) was tailed with terminal transferase at different riboGTP/dGTP ratios in the 
presence and absence of thermally fragmented DNA. Five pmol of this oligonucleotide and 
100 ng of thermally fragmented DNA or just 5 pmol oligonucleotide were incubated with 10 
units of terminal transferase (NEB) in Ix NEBuffer 4 containing 0.25 mM CoCk, 100fiM 
riboGTP and varying concentrations of dGTP (0, 10, 20 and 50 |iM) in a final volume of 20 
\xl for 20 min at 37°C. The reaction was stopped by adding 2 \xl of 0.5 M EDTA, pH 8.0. 
Samples were ethanol-precipitated and then separated on a denaturing 15% polyacrylamide 
TBE-Urea gel. After electrophoresis, the gel was analyzed using Bio-Rad Fluor S Imager 
with Fluorescein filter and Quantity One software. 

[0274] FIG. 28 shows the result of TdT tailing with riboGTP/dGTP mixtures. 
Lane 2 shows the mobility of non-processed oligonucleotide OL5. Incubation of the 
oligonucleotide OL5 with TdT and 100|aM riboGTP produces tails of 3-4 G bases (FIG. 28, 
lane 1). Addition of dGTP at 10, 20 or 50 |iM concentration results in homopolymeric tails 
containing in average 6, 8 or 10 mixed riboG/dG residues, respectively (FIG. 28, lanes 3, 5, 
7). The presence of thermally fragmented genomic DNA slightly reduced average length of 
tails (FIG. 28, lanes 4, 6, 8). Taking into account the fact that both dGTP and riboGTP 
stimulate the 3' exonuclease activity of the terminal transferase at high nucleotide 
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concentration (Examples 14-17), it is reasonable to speculate that' similar tails are added to 3' 
ends of genomic DNA. 

[0275] Thus, this example provides a guideline for controlled G-tailing of DNA 
fragments produced by thermo-fragmentation, mechanical shearing or any other means that 
result in DNA ends lacking 3' hydroxy 1 group. 
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[0277] Although the present invention and its advantages have been described in 
detail, it should be understood that various changes, substitutions and alterations can be made 
herein without departing from the spirit and scope of the invention as defined by the 
appended claims. Moreover, the scope of the present application is not intended to be limited 
to the particular embodiments of the process, machine, manufacture, composition of matter, 
means, methods and steps described in the specification. As one of ordinary skill in the art 
will readily appreciate from the disclosure of the present invention, processes, machines, 
manufacture, compositions of matter, means, methods, or steps, presently existing or later to 
be developed that perform substantially the same function or achieve substantially the same 
result as the corresponding embodiments described herein may be utilized according to the 
present invention. Accordingly, the appended claims are intended to include within their 
scope such processes, machines, manufacture, compositions of matter, means, methods, or 
steps. 
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We claim: 

1 . A method of preparing a DNA molecule, comprising: 

obtaining a DNA molecule; 

randomly fragmenting the DNA molecule to produce DNA fragments; 

attaching a primer having substantially known sequence to at least one end 
of a plurality of the DNA fragments to produce primer-linked fragments; 
and 

amplifying a plurality of the primer-linked fragments. 

2. The method of claim 1, further comprising concomitantly sequencing the 
plurality of primer-linked fragments. 

3. The method of claim 1, wherein said randomly fragmenting of the DNA 
molecule is by mechanical fragmentation. 

4. The method of claim 3, wherein said mechanical fragmentation of the DNA is 
by hydrodynamic shearing, sonication, or nebulization. 

5. The method of claim 1, wherein said randomly fragmenting of the DNA 
molecule is by chemical fragmentation. 

6. The method of claim 5, wherein said chemical fragmentation is by acid 
catalytic hydrolysis, alkaline catalytic hydrolysis, hydrolysis by metal ions, hydroxyl radicals, 
irradiation, or heating. 

7. The method of claim 5, wherein said chemical fragmentation is by heating. 

8. The method of claim 5, wherein said heating is to a temperature of between 
about 40°C and 120°C. 

9. The method of claim 5, wherein said heating is to a temperature of between 
about 80°C and 100°C. 

10. The method of claim 5, wherein said heating is to a temperature of between 
about 90°C and 100°C. 

11. The method of claim 5, wherein said heating is to a temperature of between 
about 92°C and 98°C. 

12. The method of claim 5, said heating is to a temperature of between about 93°C 
and 97°C. 

13. The method of claim 5, wherein said heating is to a temperature of between 
about 94°C and 96°C. 
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14. The method of claim 5, wherein said heating is to a temperature of about 

95°C. 

15. The method of claim 5, wherein said heating of the DNA molecule is in a 
solution having from 0 to about 100 mM concentration of a salt. 

16. The method of claim 5, wherein said heating is in a solution having from 
about 0 to about 10 mM concentration of salt. 

17. The method of claim 5, wherein said heating is in a solution having from 
about 0.1 to about 1 mM concentration of salt. 

18. The method of claim 5, wherein said heating is in a solution having from 
about 0. 1 to about 0.5 mM concentration of salt. 

19. The method of claim 5, wherein said heating is in a solution of 10 mM Tris, 
pH 8.0; 1 mM EDTA. 

20. The method of claim 5, wherein said heating is in a solution of water. 

21. The method of claim 1, wherein said randomly fragmenting of the DNA 
molecule is by enzymatic fragmentation. 

22. The method of claim 21, wherein said enzymatic fragmentation comprises 
digestion with DNAse I. 

23. The method of claim 22, wherein said DNAse I digestion is in the presence of 
Mg 2+ ions. 

24. The method of claim 23, wherein the concentration of said Mg 2+ is about ImM 
tQ about 10 mM. 

25. The method of claim 22, wherein said DNAse I digestion is in the presence of 
Mn 2+ ions. 

26. The method of claim 25, wherein the concentration of said Mn 2+ is about ImM 
to about 10 mM. 

27. The method of claim 1, wherein said primer is attached to at least one 3 ' end 
of at least one DNA fragment. 

28. The method of claim 27, wherein said attachment of a primer having 
substantially known sequence to at least one 3 ' end of at least one DNA fragment comprises 
generation of a homopolymer extension of said DNA fragment. 

29. The method of claim 28, wherein said homopolymeric extension is generated 
by terminal deoxynucleotidyltransferase. 

30. The method of claim 29, wherein said homopolymeric extension comprises a 
polyG tract. 
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31. The method of claim 27, wherein said attachment of a substantially known 
sequence to at least one 3 ' end of at least one DNA fragment comprises ligation of an adaptor 
molecule to at least one end of the DNA fragment. 

32. The method of claim 31, wherein said adaptor comprises at least one blunt 

end. 

33. The method of claim 32, wherein said adaptor comprises a single stranded 

region. 

34. The method of claim 1, wherein said method further comprises generation of 
at least one blunt end of said DNA fragments. 

35. The method of claim 34, wherein said blunt end is generated by T4 DNA 
polymerase, Klenow, or a combination thereof. 

36. A method of preparing a library of DNA molecules, comprising: 

obtaining a plurality of DNA molecules; 

randomly fragmenting at least one of the DNA molecules to produce DNA 
fragments; 

attaching a primer having a substantially known sequence to at least one 
end of a plurality of the DNA fragments to produce primer-linked 
fragments; and 

amplifying a plurality of the primer-linked fragments. 

37. The method of claim 36, further comprising concomitantly sequencing the 
plurality of primer-linked fragments. 

38. The method of claim 36, wherein said randomly fragmenting of the DNA 
molecule is by mechanical fragmentation. 

39. The method of claim 38, wherein said mechanical fragmentation of the DNA 
is by hydrodynamic shearing, sonication, or nebulization. 

40. The method of claim 36, wherein said randomly fragmenting of the DNA 
molecule is by chemical fragmentation. 

41. The method of claim 40, wherein said chemical fragmentation is by acid 
catalytic hydrolysis, alkaline catalytic hydrolysis, hydrolysis by metal ions or complexes, 
hydroxyl radicals, irradiation, or heating. 

42. The method of claim 40, wherein said chemical fragmentation is by heating. 
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43. The method of claim 42, wherein said heating is to a temperature of between 
about 40°C and 120°C. 

44. The method of claim 42, wherein said heating is to a temperature of between 
about 80°C and 1 00°C. 

45. The method of claim 42, wherein said heating is to a temperature of between 
about 90°C and 100°C. 

46. The method of claim 42, wherein said heating is to a temperature of between 
about 92°C and 98°C. 

47. The method of claim 42, said heating is to a temperature of between about 
93°C and 97°C. 

48. The method of claim 42, wherein said heating is to a temperature of between 
about 94°C and 96°C. 

49. The method of claim 42, wherein said heating is to a temperature of about 

95°C. 

50. The method of claim 42, wherein said heating of the DNA molecule is in a 
solution having from 0 to about 100 mM concentration of a salt. 

51. The method of claim 42, wherein said heating is in a solution having from 
about 0 to about 10 mM concentration of salt. 

52. The method of claim 42, wherein said heating is in a solution having from 
about 0.1 to about 1 mM concentration of salt. 

53. The method of claim 42, wherein said heating is in a solution having from 
about 0.1 to about 0.5 mM concentration of salt. 

54. The method of claim 42, wherein said heating is in a solution of 10 mM Tris, 
pH 8.0; 1 mM EDTA. 

55. The method of claim 5, wherein said heating is in a solution of water. 

56. The method of claim 36, wherein said randomly fragmenting of the DNA 
molecule is by enzymatic fragmentation. 

57. The method of claim 56, wherein said enzymatic fragmentation comprises 
digestion with DNAse I. 

58. The method of claim 57, wherein said DNAse I digestion is in the presence of 
Mg 2+ ions. 

59. The method of claim 57, wherein said DNAse I digestion is in the presence of 
Mn 2+ ions. 
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60. The method of claim 59, wherein said primer is attached to at least one 3 ' end 
of at least one DNA fragment. 

61. The method of claim 57, wherein said attachment of the primer having 
substantially known sequence to at least one 3 ' end of at least one DNA fragment comprises 
generation of a homopolymer extension of said DNA fragment. 

62. The method of claim 61, wherein said homopolymeric extension is generated 
by terminal deoxynucleotidyltransferase. 

63. The method of claim 62, wherein said homopolymeric extension comprises a 
polyG tract. 

64. The method of claim 36, wherein said attachment of a primer having 
substantially known sequence to at least one 3 ' end of at least one DNA fragment comprises 
ligation of an adaptor molecule to at least one end of the DNA fragment. 

65. The method of claim 64, wherein said adaptor comprises at least one blunt 

end. 

66. The method of claim 64, wherein said adaptor comprises a single stranded 

region. 

67. The method of claim 66, wherein said method further comprises generation of 
at least one blunt end of said DNA fragments. 

68. The method of claim 67, wherein said blunt end is generated by T4 DNA 
polymerase, Klenow, or a combination thereof. 

69. A library generated by the method of claim 36. 

70. A method of generating a library of DNA templates, comprising: 

obtaining a plurality of DNA molecules; 

randomly fragmenting the plurality of DNA molecules to produce DNA 
fragments; 

attaching a first primer having substantially known sequence to at least one 
end of a plurality of the DNA fragments to produce primer-linked 
fragments; and 

amplifying a plurality of the primer-linked fragments, wherein the 
amplification utilizes: 

a second primer complementary to a known sequence in the DNA 
fragments; and 
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a third primer complementary to the first primer. 

71. The method of claim 70, further comprising the step of sequencing 
concomitantly said plurality of DNA fragments using a fourth primer complementary to said 
known sequence in the DNA fragments. 

72. The method of claim 71 , wherein said fourth primer is said second primer. 

73. A library generated by the method of claim 70. 

74. A method of sequencing a plurality of DNA fragments concomitantly, 
comprising: 

obtaining a plurality of DNA molecules; 

randomly fragmenting the DNA molecules to generate a plurality of DNA 
fragments having* overlapping sequences; 

attaching a first primer having a substantially known sequence to at least 
one end of the plurality of the DNA fragments to produce primer-linked 
fragments; and 

amplifying a plurality of the primer-linked fragments, wherein the 
amplification utilizes: 

a second primer complementary to a known sequence in the DNA 
fragments; and 

a third primer complementary to the first primer; and 

sequencing said plurality of DNA fragments using a fourth primer 
complementary to said known sequence in the DNA fragments. 

75. The method of claim 74, wherein said fourth primer is said second primer. 

76. A method of sequencing a consecutive overlapping series of nucleic acid 
sequences, comprising the steps of: 

obtaining a plurality of DNA molecules having overlapping sequences; 

concomitantly sequencing a first region in said plurality of DNA molecules 
using a primer complementary to a known sequence in said plurality of 
DNA molecules; and 

concomitantly sequencing a second region in said plurality of DNA 
molecules using a primer complementary to sequence determined from the 
sequencing of the first region, 
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wherein the next consecutive sequencing of a region in the overlapping 
series of nucleic acid sequences is produced by initiating sequencing from 
the sequence obtained in a preceding overlapping sequencing product. 

77. The method of claim 76, wherein said obtaining step is further defined as 

randomly fragmenting at least one parent DNA molecule to generate a 
plurality of DNA fragments having overlapping sequences; 

attaching a first primer having a substantially known sequence to at least 
one end of the plurality of the DNA fragments to produce primer-linked 
fragments; and 

amplifying a plurality of the primer-linked fragments, wherein the 
amplification utilizes: 

a second primer complementary to a known sequence in the DNA 
fragments; and 

a third primer complementary to the first primer. 

78. A method of sequencing a plurality of DNA molecules, comprising: 

obtaining said plurality of DNA molecules by randomly fragmenting a 
parent DNA molecule; 

sequencing concomitantly said plurality of DNA molecules with a primer 
complementary to a known sequence in said plurality of molecules. 

79. The method of claim 78, wherein said method further comprises amplification 
of the plurality of DNA molecules. 

80. The method of claim 79, wherein said amplification is further defined as: 

attaching a first primer having a substantially known sequence to at least 
one end of the plurality of the DNA fragments to produce primer-linked 
fragments; and 

amplifying a plurality of the primer-linked fragments, wherein the 
amplification utilizes:- 

a second primer complementary to a known sequence in the DNA 
fragments; and 

a third primer complementary to the first primer. 
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81. A method of preparing a DNA molecule having sequences that generate 
secondary structure in said molecule, comprising: 

obtaining the DNA molecule having said sequences; 

randomly fragmenting the DNA molecule to produce a plurality of DNA 
fragments, wherein the plurality of DNA fragments comprises DNA 
fragments having part or all of the sequences which generate the secondary 
structure; 

attaching a primer having substantially known sequence to at least one end 
of a plurality of the DNA fragments to produce primer- linked fragments; 
and 

amplifying a plurality of the primer-linked fragments. 

82. The method of claim 81, further comprising concomitantly sequencing the 
plurality of primer-linked fragments. 

83. The method of claim 81, wherein said plurality of DNA fragments further 
comprises DNA fragments having none of the sequences which generate the secondary 
structure. 

84. The method of claim 81, wherein said secondary structure is a hairpin, a G 
quartet, or a triple helix. 

85. The method of claim 1, wherein the obtained DNA molecule comprises 
genomic DNA, BAC DNA, or plasmid DNA. 

86. A method of conditioning a 3 ' end of a DNA molecule, comprising exposing 
said 3 ' end to terminal deoxynucleotidyltransferase. 

87. The method of claim 86, wherein said terminal deoxynucleotidyltransferase is 
further defined as comprising 3 9 exonuclease activity. 

88. The method of claim 86, wherein said exposing step further comprises 
providing a guanine ribonucleotide, guanine deoxyribonucleotide, or both. 

89. A method of providing 3 ' exonuclease activity to the end of a DNA molecule 
comprising the step of introducing terminal deoxynucleotidyltransferase to the end of said 
molecule. 

90. The method of claim 89, wherein said introducing step further comprises 
providing a guanine ribonucleotide, guanine deoxyribonucleotide, or both. 

91 . A method of preparing a probe, comprising: 
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obtaining at least one DNA molecule; 

randomly fragmenting the DNA molecule to produce DNA fragments; 

attaching a labeled primer having substantially known sequence to at least 
one end of a plurality of the DNA fragments to produce labeled primer- 
linked fragments; and 

amplifying a plurality of the primer-linked fragments. 

92. The method of claim 91, wherein said attaching step of a labeled primer 
comprises generation of a homopolymer extension of said DNA fragment, wherein said 
extension comprises the label. 

93. The method of claim 92, wherein said homopolymeric extension is generated 
by terminal deoxynucleotidyltransferase. 

94. The method of claim 91, wherein said attaching step of a labeled primer 
comprises ligation of an adaptor molecule to at least one end of the DNA fragment, wherein 
the adaptor molecule comprises the label. 

95. The method of claim 91, wherein the label comprises a radionuclide, an 
affinity tag, a hapten, an enzyme, a chromophore, or a fluorophore. 

96. A labeled probe generated from the method of claim 91. 

97. A kit comprising a probe generated from the method of claim 91 . 

98. A method of repairing a 3' end of at least one single stranded DNA molecule, 
comprising providing to said 3 ' end a terminal deoxynucleotidyltransferase. 

99. The method of claim 98, wherein said providing step further comprises 
providing a guanine ribonucleotide, guanine deoxyribonucleotide, or both. 

100. A kit for repairing a 3' end of at least one single stranded DNA molecule, 
wherein said kit comprises a terminal deoxynucleotidyltransferase. 

101. The kit of claim 100, wherein said kit comprises a guanine ribonucleotide, 
guanine deoxyribonucleotide, or both. 

102. A method of detecting a damaged DNA molecule, comprising the step of 
providing to said damaged DNA molecule terminal deoxynucleotidyltransferase and a labeled 
guanine ribonucleotide, labeled guanine deoxyribonucleotide, or both. 

103. The method of claim 102, wherein the damaged DNA molecule comprises a 
nick, a double stranded break, or both. 
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104. The method of claim 102, wherein the providing step is further defined as 
providing repair to said damaged DNA molecule. 

105. The method of claim 102, wherein said label comprises a radionuclide, an 
affinity tag, a hapten, an enzyme, a chromophore, or a fluorophore. 

106. The method of claim 102, wherein said damaged DNA is outside a cell. 

107. The method of claim 106, wherein said damaged DNA is the result of 
radiation, ultraviolet light, oxygen, a radical, a metal ion, a nuclease, or mechanical force. 

108. The method of claim 102, wherein said damaged DNA is in a cell. 

109. The method of claim 108, wherein said cell is an apoptotic cell. 

110. The method of claim 108, wherein said damaged DNA is the result of 
radiation, heat, ultraviolet light, oxygen, radicals, nitric oxide, catecholamine, or a nuclease. 
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